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Ills of the system 


Reform is long overdue for Germany’s archaic medical-education system, which puts undue 


pressure on students and contaminates the scientific literature. 


Ursula von der Leyen, accusations of plagiarism in her quarter- 
century-old medical thesis may not seem to rank very highly. 
Yet similar allegations claimed the scalp of her predecessor. 

Although plagiarism is a universal plague in academia, Germany has 
its own distinct circumstances. Almost uniquely among nations, most 
German medical students must squeeze out a doctoral thesis during 
their years of full-time training. Many of these theses, not surprisingly, 
are not very good. Corners are cut and quality suffers. 

The high-profile case of von der Leyen’s 1990 dissertation, first 
publicized in September by the web platform VroniPlag Wiki, which 
searches theses for plagiarism, should bring change — but not in the 
government. It is Germany’s antiquated medical-education system 
that must be reformed. 

Von der Leyen — who denies misconduct charges and has asked 
Hanover Medical School, where she studied, to investigate — is hardly 
alone. Evidence that the system of medical doctorates is failing has 
been accumulating for years. 

Thousands of these dissertations are produced every year in 
Germany and plagiarism is far from the only problem. The DFG, 
Germany’s research agency, and the Wissenschaftsrat, its high-level 
science council, have over the years drawn attention to more funda- 
mental problems, such as study design and analysis. Some experts 
privately say that most medical theses are scientifically valueless. 

Germany justifiably takes pride in its long tradition, and high 
standards, in science. So what is going so badly wrong in its medi- 
cal faculties? In most countries, medical students receive their 
medical degree — and ‘Dr title — after successfully completing 
both preclinical undergraduate studies and clinical training, and 
then passing a state examination. Not so in Germany, where the 
degree gives them only the right to practise medicine — not to title 
themselves Dr. To acquire that honour, an extra step is required: a 
research project leading to a thesis, done, written up and published 
in the student's spare time. Most students choose to do this: after 
all, what ill person wants to visit a doctor who does not bear that 
title? But in the busy, frequently self-important, world of the clinical 
sciences, supervision is often inadequate. 

In 2004, the Wissenschaftsrat called for an end to this system and 
the laxness that it actively encourages. It recommended that medi- 
cal students get their medical degree and doctor title automatically, 
without having to do a research thesis. Students with genuine interest 
in medical science, it said, should have the option of taking time out 
to doa PhD to the same standards as other sciences. 

Because the Wissenschaftsrat includes representatives of federal 
and state governments as well as top scientists, its recommendations 
are usually implemented. But the call for the automatic degree and 
title — which would require a change of federal law — fell on deaf ears. 
Medical faculties ignored it, although many have established graduate 


() fall the problems on the desk of the German defence minister 


schools to make available an alternative route to a high-quality PhD. 
The value of those recommendations has not changed in the past 
decade, however. Good graduate colleges for the medical sciences 
are fundamental to the drive to speed basic-research discoveries into 
the clinic, an ambition that requires research-savvy physicians. But 
it makes no sense to maintain the requirement for a quick-and-dirty 
thesis, which adds stress to medical students who are already under 
immense pressure, while teaching them little 


“It makes beyond the dangerous lesson that it is accept- 
no sense to able for medical science to be sloppy. 

maintain the In 2010, the DFG published a strongly 
requirement for — worded report calling for scientific stand- 


ards in medical dissertations to be raised, and 
earlier this year the German Rector’s Confer- 
ence (HRK) established a task force to look 
into the problem. However, like the DFG and the Wissenschaftsrat, the 
HRK will be able to do no more than make recommendations. Medi- 
cal faculties and the profession in general now have to decisively shed 
their reluctance to abandon their aberrant doctoral system. They should 
do so, before the public shame becomes unbearable. How many medi- 
cal theses need be exposed on VroniPlag Wiki — which already hosts 
dozens of examples, some quite brazen — before the bankruptcy of the 
system is accepted? 

Plagiarism can never be defended. But the pressures on medical 
students — many of whom do not resort to plagiarism in response 
— make the temptation to indulge understandable. Von der Leyen 
may simply have been a student of her times — times that now have 
to change. = 


a quick-and- 
dirty thesis.” 


Care for the carers 


Researchers should add their voices to the effort 
to stop attacks on health workers in war zones. 


the First World War in 1918, it is reprehensible that humanitar- 

ian rules forged in the suffering and bloodshed of battle are often 

being violated in contemporary conflicts. In the past month alone, two 
hospitals run by Médecins Sans Frontiéres (MSF; also known as Doctors 
Without Borders) were hit by air strikes. US warplanes destroyed one 
in Kunduz in Afghanistan — killing 13 MSF staff and 17 others — and 
another in Yemen was targeted, allegedly by Saudi-led coalition forces. 
These are not isolated incidents, but part of a string of violations of 

a fundamental part of international humanitarian law — that warring 


A s the world this week commemorates the armistice that ended 
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parties must consider the wounded and the medical staff who care for 
them as neutral, and protect them from harm. 

The public and the media must increase calls for political and 
diplomatic pressure to help to prevent such attacks. The scientific com- 
munity, and in particular biomedical and clinical researchers and the 
professional bodies that represent them, must add their voices to this 
timely and important matter. 

The need for ground rules in conflicts has been recognized since 
antiquity, but today’s international humanitarian laws have their roots in 
the work of the nineteenth-century Swiss businessman, Henry Dunant. 
Horrified by the thousands of wounded left untreated and dying on the 
battlefield after the French and Sardinians crushed the Austrian army 
at Solferino in Italy in 1859, he proposed that states should allow, and 
protect, humanitarian volunteers to care for those who are wounded. 

In 1863, he helped to found what was to become the Interna- 
tional Committee of the Red Cross (ICRC). Dunant’ efforts spurred 
16 countries to agree the following year to the first internationally 
codified rules of war; the first Geneva Convention for the Ameliora- 
tion of the Condition of the Wounded and Sick in Armed Forces in 
the Field. As well as granting neutral status to medical staff, it obliged 
warring parties to care for wounded enemy prisoners. 

As the nature of warfare has changed, so the wording and scope of 
the Geneva Conventions have been regularly revised — for example in 
1949 to better protect civilians. The principle of medical neutrality is 
more relevant today than ever, but it is under increasing threat. 

Syria, where conflict sparked in 2011, is by far the worst case. As 
of the end of September, 313 attacks on 227 medical facilities had 
been reported — 283 of them carried out by government forces, often 
using indiscriminate ‘barrel bombs’ dropped from helicopters. Over 
the same period, 679 medical staffhave been killed, almost all by gov- 
ernment forces, and scores of others have been arrested, imprisoned 
or tortured. The regime has also deployed chemical weapons. The 
health system has been all but destroyed in large parts of the country. 

During peaceful protests in Turkey in 2013 and 2014, the govern- 
ment used violence against clinics and medical staff, and health workers 
have been arrested and charged with assisting criminals simply for 
having treated wounded protestors. Similarly, during protests against 
the government in Bahrain in 2013, doctors and nurses were fired 
from civil-service posts, then arrested and jailed for the same motive as 
those in Turkey. Dozens of workers dispensing polio vaccinations have 
been assassinated in Pakistan and Nigeria. The ICRC has identified 
almost 2,000 incidents of violence against patients, health workers and 
medical facilities in 23 countries in 2012 and 2013 alone. 


These are estimates, but comprehensive monitoring of violations 
and data are both lacking. However, Susannah Sirkin, director of inter- 
national policy and partnerships for the humanitarian group Physi- 
cians for Human Rights, based in New York City, points out that “we 
can safely say that the bombing of hospitals and deliberate killing of 
hundreds of medics, especially in Syria, is something more extreme 
and extensive than we have ever seen”. 

Among the explanations is a lack of awareness of the Geneva 
Conventions by protagonists — in what are increasingly not wars 

between nations, but smaller civil and sectar- 


“The Geneva ian wars, often involving non-state actors — 
Conventions but also a poor grasp by the media and public. 
lackabodywith People may have “become inured to the 
teeth to ensure extraordinary level of targeting of civilians 
that the rules in many conflicts in the past few decades 


and simply shrug at the inclusion of medical 
facilities as regular targets’, adds Sirkin. What 
is worrying, she says, is that the overt targeting of humanitarian and 
health workers has become the “new normal’, despite it being illegal 
under international law — and having the effect of depriving entire 
populations of health care, and children of vital vaccinations. 

But above all, abuses happen because there is little accountability, 
with perpetrators operating with almost total impunity, despite their 
actions often clearly amounting to war crimes — or indeed crimes 
against humanity. The Geneva Conventions lack a body with teeth to 
ensure that the rules are respected, or to stop abuses when they are under 
way. They also lack mechanisms to investigate and prosecute abuses. 

Accountability has also suffered because many of those affected are 
voiceless. MSF, by contrast, has both political clout and moral authority, 
and, for example, is robustly and rightly pressing for an independent 
international fact-finding commission under the Geneva Conventions 
into the attacks on its facilities. 

Momentum to stop the attacks, led by campaigns from humanitarian 
groups, is building within civil society. Meanwhile, Ban Ki-moon, 
the secretary-general of the United Nations, and Peter Maurer, the 
president of the ICRC, last week issued a joint warning about the 
unprecedented level of violations of international humanitarian law 
in ongoing conflicts. 

As well as the armistice, this month marks 100 years since the 
decision to evacuate troops from the ill-fated 1915 Gallipoli campaign, 
in which medical staff working under atrocious battlefield conditions 
suffered extensive casualties. The world has been shocked into action 
to protect health workers before. It must be again. m 


are respected.” 


Smooth operator 


A tribute to the nineteenth-century polymath 
whose algebra lets you search the Internet. 


birthday this week. NOT that it makes any sense to say such a thing 

OR to write it. People do NOT live that long. And if there was one 
thing that George Boole is known for, it is logic. AND mathematics 
AND philosophy. Three things. NOT one thing. 

The combination of mathematics AND logic AND philosophy is 
NOT easy for many people to follow OR understand. So Boole is usu- 
ally associated with the three words NOT AND OR. They are called 
Boolean operators AND they can be combined to make AND NOT. 
That's because the Boolean operator OR does NOT really mean OR, 
which usually means AND NOT. 

It is NOT always easy to follow these logical constructions when 
they are written in words. That is why so many people call George 


I F George Boole had lived, then he would have celebrated his 200th 
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Boole a genius. Because he did NOT have the same trouble. AND 
because he invented them OR applied them to mathematics. Without 
George Boole, people say that the modern world would NOT have 
been the same, with no computers OR electronics. Although a nice 
thing to say, it is probably NOT true AND someone else could have 
come up with the idea OR something similar. After all, Boole himself 
is a good example, who shows that ideas AND NOT inventions can 
come from an unlikely source. 

He did NOT have a formal education OR academic training. He 
taught himself languages including Latin AND Greek AND calcu- 
lus. He wrote scientific papers on how to represent logical relations 
as symbols AND algebraic equations. Despite NOT having a univer- 
sity education, he was appointed professor of mathematics at Queen's 
College Cork in Ireland. 

The weather in Ireland is often NOT dry and Boole caught pneu- 
monia after walking to the college in heavy rain. 
His wife, Mary, a prominent mathematician, was 
NOT as skilled at medicine. She soaked her hus- 
band’s sheets with water AND made him shiver 
with cold. It did NOT help AND, sadly, he died. = 
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LEN FISHER 


WORLD VIEW  jennisicos sen 


hat can a 70-year-old book on how to play bridge tell us 
Wie addressing the ongoing refugee crisis in Europe? And 
what does it have to do with King Lear? 

In Shakespeare's play, the Duke of Albany warns that “striving to 
better, oft we mar what’s well”. In the search for a solution, in other 
words, we can let the perfect become the enemy of the good. In his 
1945 book Why You Lose at Bridge, S. J. Simon called it the half-loaf 
strategy: the most successful players aim for the best possible result, 
rather than the best result possible. 

In human and political crises, the best possible result is often one of 
damage limitation — an outcome that avoids or delays the chance of 
a large-scale and catastrophic change. So, the question then becomes: 
how can we achieve such an outcome? 

In a recent editorial (see Nature 525, 157; 
2015), Nature suggests that nations should “keep 
a welcome’ for refugees. I agree. This pressure- 
releasing approach could serve as an effective 
paradigm for policy development, being used to 
handle emergent crises of many types. It makes 
sense: not just from a humanitarian perspective 
but also from what we now understand about 
the underlying behaviour of our interconnected 
global socio-economic-ecological system. For 
example, convincing connections have been 
drawn among the European refugee crisis, global 
warming and future food supplies, so a solution 
to one problem is likely to bear on the others. 

Such complex systems can undergo sudden 
change at any time. These changes (known tech- 
nically as ‘regime shifts, ‘critical transitions’ or 
‘catastrophic bifurcations’) occur at all scales, 
happen with little warning and often have no apparent cause. They 
frequently seem to be out of our control. Examples include cascading 
failure in power grids, communication networks, financial systems, 
food webs and social organizations; epidemics, not just of disease but 
also of social unrest and innovation; and sudden shifts in the balance 
of power, be they in international relationships or small groups. 

Policy development to deal with such sudden change is often based 
on searching for (or blaming) specific causes. But, to quote H. G. Wells, 
“History is a race between education and catastrophe.” Scientists must 
show policymakers that sudden change is inevitable in any complex 
system, and the first step towards avoiding or minimizing catastrophe 
is to recognize this. 

The second step is to understand the nature of these transitions. 
Scientists have modelled them as, for example, 


sudden slippages ina sand pile when extra grains DNATURE.COM 

are progressively added, or as evolving interac- _ Discuss this article 
tions of multiple positive and negative feed- _ online at: 

back loops ina system. Animportant common  go.iiatlire.com/tmgk72 


WE SHOULD PROMOTE 
SMALLER, 


LESS- 
DAMAGING 


TRANSITIONS TO 
REDUCE THE 
CHANCE OF 


LARGER ONES. 


Avoid major disasters by 
welcoming minor change 


Scientists can educate policymakers on how to deal with the European refugee 
crisis — it’s all about alleviating the pressure, says Len Fisher. 


feature of these models is that they predict that smaller changes are 
more frequent than larger ones. 

Scientists have also suggested a number of different ways to develop 
policies to deal with the potential for sudden change. When it comes to 
protecting against terrorist attacks, some suggest that we must concen- 
trate resources to protect critical nodes in a network. On the stability of 
banking systems, researchers argue for structural changes in networks 
so that damage in one part cannot easily propagate to others. A third 
idea, which to some extent complements the first two, is to build more 
resilience into our societies and institutions. 

These ideas have their merits, but the ‘keep a welcome strategy for 
the refugee crisis suggests a different approach — one that can more 
easily be adapted to take account of the impor- 
tant (and sometimes overwhelming) human 
dimension in many crises. 

This approach, which my colleague Jim 
Gimzewski and I have been examining, involves 
reducing the chances of sudden, large-scale, dam- 
aging change by altering the shape of the statisti- 
cal distribution of event sizes. We should promote 
smaller, less-damaging transitions to reduce the 
chance of larger ones occurring. In metaphori- 
cal terms, the aim should be to reduce pressures 
before they can build to dangerous levels. 

This is not a new principle. It underpins, for 
instance, the practice of triggering small snow 
avalanches to reduce the probability and impact 
of a major one, and it also has parallels with the 
philosopher Karl Popper's idea of ‘piecemeal 
social engineering. A simple social example is 
the reduction of traffic congestion by breaking 
the traffic into manageable blocks that are separated and accompanied 
by slow-moving police cars. Progress is still slow, but on average it is 
much faster than if large traffic jams were allowed to develop. 

Most social problems, of course, are not quite this simple. But we 
must be wary of the ‘nirvana effect’ — the belief that perfect solutions 
are out there somewhere. A half-loaf of bread is always better than 
none at all. Thus, for example, in the case of the Greek debt crisis, 
our approach suggests paying to create jobs, rather than imposing 
austerity. The cost of the former would be far less than the social and 
economic costs that may result from the latter. 

The current refugee crisis falls into a similar category. Countries 
willing to bear the (financial and political) cost of welcoming more 
refugees with fewer restrictions would promote small-scale changes 
that release the build-up of devastating social pressures. In this way, 
scientific and humanitarian values can work hand in hand. » 


Len Fisher is a visiting research fellow at the University of Bristol, UK. 
e-mail: len.fisher@bristol.ac.uk 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


ECOLOGY 


Carnivores curbed 
mammoth numbers 


Sabre-toothed cats and other 
large carnivores were probably 
able to hunt down young 
mammoths and mastodons 
during the Pleistocene epoch, 
between 2.6 million years and 
12,000 years ago. That would 
explain why Earth's forests 
were not grazed to death by the 
large numbers of big herbivores 
before they went extinct. 
Researchers have long 
thought that mammoths and 
other giant herbivores were 
too large to have predators. 
Blaire Van Valkenburgh of the 
University of California, Los 
Angeles, and her colleagues 
analysed data on the relative 
body masses of modern 
predators and prey, and 
compared them with those 
of fossil specimens. They 
estimate that some Pleistocene 
predators, such as sabre- 
toothed cats and very large 
hyenas, were big enough to 
kill young megaherbivores 
— enabling them to control 
herbivore populations. 
Proc. Nat! Acad. Sci. USA 
http://doi.org/8th (2015) 


Related wasps 
commit treason 


Yellow-jacket wasps live to 
serve their mother, the queen, 
but will kill her if she fails to 


secure more than one mate. 


Beads dance on sound waves 


A bank of speakers can grip, move and rotate 
particles in air from one side (pictured). 

Sound has been used to levitate small 
objects, but single-sided devices offered 
little manoeuvrability. Asier Marzo at the 
Public University of Navarre in Pamplona, 
Spain, and his colleagues used a flat array 
of 64 loudspeakers to levitate beads of 
polystyrene up to 3 millimetres wide. The 
authors used algorithms to create interference 


Colonies of yellow-jacket 
wasps (Dolichovespula 
arenaria; pictured) have a 
single queen that generates 
female workers, which rarely 
reproduce, and reproductive 
males. But Kevin Loope 
at Cornell University in 
Ithaca, New York, found that 
just under half of colonies 
eventually revolt, with the 
workers killing their queen and 
producing their own males. To 
find out why, Loope collected 
wasp nests and measured the 
workers’ relatedness. Matricide 
was most common in colonies 
where workers were more 
closely related to each other. 

This means that the queen 
had only one mate, making 
workers less closely related to 
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the authors. 


the queen’s sons than to the 
sons of other workers. Workers 
prefer males that are more 
closely related to them, so it 
benefits them to overthrow the 
queen and produce their own 
sons. 

Curr. Biol. http://doi.org/8vz 
(2015) 


Arctic open-water 
season grows 


Ice could cover Arctic coastal 

regions for only half the year by 

the 2070s, if human-induced 

climate change continues. 
Most of these areas are 

now covered in ice for more 

than half the year, and even 


patterns in waves of ultrasound that formed 
regions of high and low intensity — shaped 
as tweezers, tornadoes or bottles — which 
allowed them to trap and then move the 
particles in various directions. 

The device could be used to manipulate 
particles for targeted drug delivery or to operate 
tiny surgical devices from outside the body, say 


Nature Commun. 6, 8661 (2015) 


all year in some places. 

Using data on daily sea-ice 
concentrations, Katherine 
Barnhart at the University of 
Colorado Boulder and her 
colleagues mapped changes in 
the Arctic’s open-water season 
since pre-industrial times, 

and used models to project 
future changes. They found 
that throughout the Arctic, the 
season began to lengthen in 
the 1990s, with ice break-up 
starting earlier and freeze-up 
setting in later. In business-as- 
usual climate-change scenarios, 
the models indicate that the 
duration of open-water seasons 
for much of the region will 
start to exceed pre-industrial 
bounds by the middle of this 
century. 
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The expansion of the 
open-water season will 
affect all aspects of the Arctic 
environment that depend 
on sea-ice coverage, such as 
polar-bear foraging and the 
livelihoods of indigenous 
people, the authors say. 
Nature Clim. Change 
http://dx.doi.org/10.1038/ 
nclimate2848 (2015) 


Altered T cells hit 
pancreatic cancer 


Genetically engineered 
immune cells that target 
a protein found on some 
pancreatic tumours can 
penetrate that cancer’s 
defences, according to 
studies in mice. 

Harnessing engineered 
T cells to combat cancer has 
been more successful for 
blood cancers than for solid 
tumours, such as those of the 
pancreas, which are protected 
by a dense cellular barrier and 
are particularly deadly. Philip 
Greenberg and Sunil Hingorani 
of the Fred Hutchinson 
Cancer Research Center in 
Seattle, Washington, and their 
colleagues engineered T cells 
to recognize a protein called 
mesothelin that is associated 
with the spread of certain 
pancreatic tumours. The 
engineered T cells were able 
to bind to this protein more 
tightly than did normal T cells. 

The engineered cells 
infiltrated pancreatic tumours 
in mice, leading to an 
increase in tumour-cell death 
compared with control mice. 
Mice that received a series of 
engineered T-cell infusions 
lived nearly twice as long as 
those that did not. 
Cancer Cell http://dx.doi.org/ 
10.1016/j.ccell.2015.09.022 
(2015) 


IMMUNOLOGY 


Worms conspire 
with gut microbes 


Intestinal worms manipulate 
their host’s immune system to 
ensure their survival, in part by 
changing the metabolism of the 


host's gut microbiome. 

The worms, called 
helminths, infect around 
2 billion people around the 
world, and are able to block 
harmful inflammatory 
responses in humans and mice. 
Nicola Harris at the Swiss 
Federal Institute of Technology 
in Lausanne and her colleagues 
studied mice infected with the 
helminth Heligmosomoides 
polygyrus bakeri, and found 
that mice that had been treated 
with antibiotics to kill gut 
bacteria before being exposed 
to the worms had more allergic 
airway inflammation than 
did untreated, worm-infected 
animals. Worm infection 
caused the microbiota to 
produce increased levels of 
short-chain fatty acids in 
mice, pigs and six out of eight 
human volunteers. The anti- 
inflammatory effects of worm 
infection were lost in mice that 
had been engineered to lacka 
receptor for the fatty acids. 

The findings suggest that 
helminths and gut microbes 
have evolved this mechanism 
to regulate the host immune 
system over many millions of 
years, the authors say. 

Immunity http://dx.doi.org/ 
10.1016/j.immuni.2015.09.012 
(2015) 


DEVELOPMENTAL BIOLOGY 


Survival boost for 
cloned embryos 


Researchers have improved 
the success rate for producing 
cloned embryos or embryonic 
stem cells by removing a 
chemical group from DNA- 
binding proteins. 
Transferring a nucleus from 
an individual's adult body cell 
into a human egg — a process 
called somatic cell nuclear 
transfer (SCNT) — could one 
day generate embryonic 
stem cells that match 
that person's DNA. But 
embryos made using 
SCNT rarely mature. To 
improve this, Dong Ryul 
Lee at CHA University in 
Seoul, Yi Zhang at Boston 
Children’s Hospital in 
Massachusetts and 
their colleagues 
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Popular topics 
on social media 


SOCIAL SELECTION 


Funding of basic science stirs debate 


Pure science does not always stimulate innovation — rather, 
technological change often springs naturally from human 
inventiveness. Writer Matt Ridley makes this provocative 
point in a 23 October essay in The Wall Street Journal called 
“The Myth of Basic Science’ (go.nature.com/2bbqpg) that 
fuelled heated and thoughtful responses on social media 
about the role and benefits of science and technology. Ridley 
says that government-funded basic research is not the only 
path towards innovations that improve society. 

But others countered that publicly funded research has 
many benefits. “The causes of technical and social change 
are manifold, and scientific research forms just part of the 
ecosystem, but this doesn’t make it inconsequential,” 
wrote Jack Stilgoe, a science-policy expert at University 
College London, in an article for The Guardian commenting 

on Ridley’s essay (go.nature.com/zkkalt). 


> NATURE.COM Ridley responded to his critics on Twitter, 
For more on saying that basic research is important 
popular papers: but that government is not the only way 
go.nature.com/tgdjzg to fund it. 


used a human messenger 

RNA encoding a protein that 
removes methyl groups from a 
type of histone protein found 
on DNA in the donor nucleus. 
When the authors injected the 
RNA into 56 human eggs that 
had received donor DNA, they 
found that 14.3% of the treated 
embryos developed into late- 
stage blastocysts, compared 
with none of the untreated 
controls. 

Using this technique, the 
team derived embryonic stem 
cells from skin cells donated 
by people with age-related 
macular degeneration, which 
causes partial vision loss. 

Cell Stem Cell http://doi.org/8v2 
(2015) 


Extra dimensions 
in 3D printing 


Two research groups 
have used magnetic 
fields to tune the texture 
and strength of materials 
as they are being printed, 
allowing the formation of 
complex 3D structures. 
André Studart and 
his colleagues at 
the Swiss Federal 


Institute of Technology in 
Zurich added magnetic 
particles at different 
concentrations to resins 

of varying viscosities. 
Applying a low magnetic 
field during the 3D printing 
process allowed the team to 
control the orientation of 
the particles, and hence the 
texture, within the printed 
object. The researchers used 
their technique to create a 
composite with an intricate 
internal spiral staircase 
(pictured). Their system 
could be used in robotics 

to print shape-changing 
objects that respond to 
environmental triggers, 
Studart says. 

In a separate paper, 
Studart’s former postdoc, 
Randall Erb, and his team 
at Northeastern University 
in Boston, Massachusetts, 
used the magnetic technique 
to improve the mechanical 
strength of 3D printed objects 
by controlling crack formation. 
Nature Commun. 6, 8643 (2015); 
8641 (2015) 


> NATURE.COM 

For the latest research published by 
Nature visit: 
www.nature.com/latestresearch 
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The expansion of the 
open-water season will 
affect all aspects of the Arctic 
environment that depend 
on sea-ice coverage, such as 
polar-bear foraging and the 
livelihoods of indigenous 
people, the authors say. 
Nature Clim. Change 
http://dx.doi.org/10.1038/ 
nclimate2848 (2015) 


Altered T cells hit 
pancreatic cancer 


Genetically engineered 
immune cells that target 
a protein found on some 
pancreatic tumours can 
penetrate that cancer’s 
defences, according to 
studies in mice. 

Harnessing engineered 
T cells to combat cancer has 
been more successful for 
blood cancers than for solid 
tumours, such as those of the 
pancreas, which are protected 
by a dense cellular barrier and 
are particularly deadly. Philip 
Greenberg and Sunil Hingorani 
of the Fred Hutchinson 
Cancer Research Center in 
Seattle, Washington, and their 
colleagues engineered T cells 
to recognize a protein called 
mesothelin that is associated 
with the spread of certain 
pancreatic tumours. The 
engineered T cells were able 
to bind to this protein more 
tightly than did normal T cells. 

The engineered cells 
infiltrated pancreatic tumours 
in mice, leading to an 
increase in tumour-cell death 
compared with control mice. 
Mice that received a series of 
engineered T-cell infusions 
lived nearly twice as long as 
those that did not. 
Cancer Cell http://dx.doi.org/ 
10.1016/j.ccell.2015.09.022 
(2015) 


IMMUNOLOGY 


Worms conspire 
with gut microbes 


Intestinal worms manipulate 
their host’s immune system to 
ensure their survival, in part by 
changing the metabolism of the 


host's gut microbiome. 

The worms, called 
helminths, infect around 
2 billion people around the 
world, and are able to block 
harmful inflammatory 
responses in humans and mice. 
Nicola Harris at the Swiss 
Federal Institute of Technology 
in Lausanne and her colleagues 
studied mice infected with the 
helminth Heligmosomoides 
polygyrus bakeri, and found 
that mice that had been treated 
with antibiotics to kill gut 
bacteria before being exposed 
to the worms had more allergic 
airway inflammation than 
did untreated, worm-infected 
animals. Worm infection 
caused the microbiota to 
produce increased levels of 
short-chain fatty acids in 
mice, pigs and six out of eight 
human volunteers. The anti- 
inflammatory effects of worm 
infection were lost in mice that 
had been engineered to lacka 
receptor for the fatty acids. 

The findings suggest that 
helminths and gut microbes 
have evolved this mechanism 
to regulate the host immune 
system over many millions of 
years, the authors say. 

Immunity http://dx.doi.org/ 
10.1016/j.immuni.2015.09.012 
(2015) 


DEVELOPMENTAL BIOLOGY 


Survival boost for 
cloned embryos 


Researchers have improved 
the success rate for producing 
cloned embryos or embryonic 
stem cells by removing a 
chemical group from DNA- 
binding proteins. 
Transferring a nucleus from 
an individual's adult body cell 
into a human egg — a process 
called somatic cell nuclear 
transfer (SCNT) — could one 
day generate embryonic 
stem cells that match 
that person's DNA. But 
embryos made using 
SCNT rarely mature. To 
improve this, Dong Ryul 
Lee at CHA University in 
Seoul, Yi Zhang at Boston 
Children’s Hospital in 
Massachusetts and 
their colleagues 


RESEARCH HIGHLIGHTS MiiiSaiiaa¢ 


Popular topics 
on social media 


SOCIAL SELECTION 


Funding of basic science stirs debate 


Pure science does not always stimulate innovation — rather, 
technological change often springs naturally from human 
inventiveness. Writer Matt Ridley makes this provocative 
point in a 23 October essay in The Wall Street Journal called 
“The Myth of Basic Science’ (go.nature.com/2bbqpg) that 
fuelled heated and thoughtful responses on social media 
about the role and benefits of science and technology. Ridley 
says that government-funded basic research is not the only 
path towards innovations that improve society. 

But others countered that publicly funded research has 
many benefits. “The causes of technical and social change 
are manifold, and scientific research forms just part of the 
ecosystem, but this doesn’t make it inconsequential,” 
wrote Jack Stilgoe, a science-policy expert at University 
College London, in an article for The Guardian commenting 

on Ridley’s essay (go.nature.com/zkkalt). 


> NATURE.COM Ridley responded to his critics on Twitter, 
For more on saying that basic research is important 
popular papers: but that government is not the only way 
go.nature.com/tgdjzg to fund it. 


used a human messenger 

RNA encoding a protein that 
removes methyl groups from a 
type of histone protein found 
on DNA in the donor nucleus. 
When the authors injected the 
RNA into 56 human eggs that 
had received donor DNA, they 
found that 14.3% of the treated 
embryos developed into late- 
stage blastocysts, compared 
with none of the untreated 
controls. 

Using this technique, the 
team derived embryonic stem 
cells from skin cells donated 
by people with age-related 
macular degeneration, which 
causes partial vision loss. 

Cell Stem Cell http://doi.org/8v2 
(2015) 


Extra dimensions 
in 3D printing 


Two research groups 
have used magnetic 
fields to tune the texture 
and strength of materials 
as they are being printed, 
allowing the formation of 
complex 3D structures. 
André Studart and 
his colleagues at 
the Swiss Federal 


Institute of Technology in 
Zurich added magnetic 
particles at different 
concentrations to resins 

of varying viscosities. 
Applying a low magnetic 
field during the 3D printing 
process allowed the team to 
control the orientation of 
the particles, and hence the 
texture, within the printed 
object. The researchers used 
their technique to create a 
composite with an intricate 
internal spiral staircase 
(pictured). Their system 
could be used in robotics 

to print shape-changing 
objects that respond to 
environmental triggers, 
Studart says. 

In a separate paper, 
Studart’s former postdoc, 
Randall Erb, and his team 
at Northeastern University 
in Boston, Massachusetts, 
used the magnetic technique 
to improve the mechanical 
strength of 3D printed objects 
by controlling crack formation. 
Nature Commun. 6, 8643 (2015); 
8641 (2015) 


> NATURE.COM 

For the latest research published by 
Nature visit: 
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Reproducibility 

A suite of measures should 

be adopted to improve the 
reproducibility of biomedical 
research, according to a report 
released on 29 October by 

the London-based Academy 
of Medical Sciences. The 
report — produced with 

the backing of government 
funders and biomedical- 
research charity the Wellcome 
Trust — says that greater 
openness, preregistration 

of research protocols and 
better use of standards should 
all be considered, although 
there is no single cause of the 
problem of many studies being 
irreproducible. See go.nature. 
com/cwynyx for more. 


Ozone-hole latest 


This year’s hole in the Antarctic 
ozone layer is the third largest 
ever observed, the World 
Meteorological Organization 
announced on 29 October. 
The hole’s average size over 

30 consecutive days spanning 
September and October was 
26.9 million square kilometres, 
the largest on record after 2000 
and 2006. The agency ascribes 
the increased size to colder- 
than-usual temperatures in 

the polar stratosphere. That 
drove the formation of more 
clouds on whose surfaces 
chlorine can readily convert 

to a form that destroys ozone. 
In the long term, the ozone 
layer is still expected to recover, 
because the 1987 Montreal 
Protocol phased out many 
chemicals that contribute to its 
destruction. 


Chronic fatigue 

The US National Institutes of 
Health is stepping up efforts 

to tackle chronic fatigue 
syndrome, also known as 
myalgic encephalomyelitis 
(CFS/ME). In an 
announcement on 29 October, 
the agency said that it would be 


Cassini dips into geysers 


NASA's Cassini spacecraft took its deepest dive through the 
geysers spurting from Saturn’s moon Enceladus on 28 October. 
The mission whizzed 50 kilometres above Enceladus’s south 
pole (pictured, bottom), directly through the icy spray coming 
from an ocean of liquid water trapped beneath a thick layer 

of fractured ice. It was the most direct taste of the water that 
Cassini's chemical-analysis sensors will ever get; in the final 
fly-by in December, the spacecraft will bypass the geysers. 


centring its CFS/ME research 
programme in the National 
Institute of Neurological 
Disorders and Stroke. Its 
plans include a clinical study 
on its campus in Bethesda, 
Maryland, that will enrol 
patients with sudden-onset 
CFS/ME apparently caused by 
an infection. 


AWARDS 


Maddox prize 

The 2015 John Maddox Prize 
was awarded to Edzard Ernst 
and Susan Jebb on 3 November. 
Ernst, emeritus researcher at 
the University of Exeter, UK, 
was given the prize for his work 
on the truth, or lack thereof, in 
claims about complementary 
and alternative medicine. Jebb, 
a researcher at the University 
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of Oxford, UK, received the 
prize for her work in furthering 
public understanding of 
nutrition. The prize for 
promoting science in the face of 
adversity is awarded jointly by 
Nature and the London-based 
charities the Kohn Foundation 
and Sense About Science. It 

is named after the late John 
Maddox, a former editor of 
Nature. 


POLICY 


One-child rule ends 
All couples in China will in 
future be allowed to have 

two children, rather than 

one, the Communist Party 
announced on 29 October. 

But demographers predict 
little effect on population 
growth in China, where many 


women are more focused on 

a career than on having large 
families. The one-child rule 
was introduced in 1979 and 

is thought to have prevented 
almost halfa billion births ina 
nation whose population now 
numbers 1.4 billion. In recent 
years the rule had been relaxed. 
See go.nature.com/skdr1n for 
more. 


Pathogen rules 


In the wake ofa series of 
high-profile laboratory 
accidents in 2014, the White 
House issued a 187-page 

set of recommendations on 

29 October for government 
agencies that work with 
dangerous pathogens. They 
include improvements to rules 
for reporting lab accidents and 
maintaining records. 


Antarctic veto 


The body that governs 
Antarctica’s waters again 
failed to agree on plans for 

a protected area in the Ross 
Sea. The Commission for 

the Conservation of Marine 
Living Resources, meeting in 
Hobart, Australia, last week, 
has repeatedly considered the 
proposals but failed to reach 
the unanimous agreement 
among nations needed to create 
the area. The Antarctic Ocean 
Alliance, a coalition of non- 
governmental organizations, 
criticized the failure to 
protect the Ross Sea and 
another proposed area in East 
Antarctica. 


GM opt-out block 


The European Parliament 

has rejected a proposal that 
would allow European Union 
member states to restrict the 
importation of genetically 
modified (GM) feeds and foods 
that have been approved at EU 
level. In the 28 October vote, 
members argued that opting 
out of EU-wide agreements to 
allow the sale of GM food was 
incompatible with the EU’s 
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SOURCE: QUEEN ELIZABETH PRIZE FOR 
ENGINEERING CREATE THE FUTURE (2015). 


single market. The European 
Commission tabled the 
proposal in April after it was 
agreed that EU member states 
could opt out of cultivating GM 
crops, which 19 of the 28 states 
have done so far. 


FUNDING 
Brain project 


The European Commission 
signed a partnership agreement 
with the ambitious but 
controversial Human Brain 
Project (HBP) on 30 October. 
The agreement will take the 
project into its fully operational 
phase that begins next April, 
when the HBP will become 

an international organization 
intended to be a permanent 
infrastructure resource 

for neuroscientists. The 
management of the project 

has been modified following 
serious criticism by some 
neuroscientists during its 
start-up phase. See go.nature. 
com/qybrng for more. 


Arecibo future 

The US National Science 
Foundation (NSF) is seeking 
new management or new 
ownership for the Arecibo 
Observatory (pictured), it said 
in a26 October notice. The 
future of the facility, the largest 
single-dish radio telescope 

on Earth, in Puerto Rico, has 
been in doubt for years. But the 
NSE, which provides roughly 
75% of Arecibo roughly 


TREND WATCH 


Women in developing nations are 


challenging the gender bias often 
found in engineering. A survey 
in 10 countries, commissioned 
by the Queen Elizabeth Prize for 
Engineering Foundation, asked 
10,000 people about their interest 
in engineering (see go.nature. 
com/khgpst). Overall, more 

men expressed interest than did 


women, but the gap was narrowest 


in emerging economies. In 
Britain, 28% of women and 58% 
of men showed interest, whereas 
the results for India were 79% for 
women and 85% for men. 


US$12-million budget, says 
that it is interested in options 
“that involve a substantially 
reduced funding commitment 
from NSF”: Astronomers use 
the facility to study pulsars and 
the upper atmosphere and to 
help measure the risk posed by 
near-Earth asteroids. 


Indian protest 


Researchers in India have 
issued a warning over religious 
intolerance in the country. 
On 27 October, the Inter- 
Academy Panel on Ethics 

in Science, a body set up by 
the Indian National Science 
Academy in New Delhi, the 
Indian Academy of Sciences 
in Bangalore and the National 
Academy of Sciences in 
Allahabad, warned that 
recent events run counter to 
the country’s constitutional 


requirement to “uphold reason 
and scientific temper”. The 
statement follows the killing 
of three advocates of rational 
thinking, as well as other cases 
of violence linked to religious 
motives. An online petition 
voicing similar concerns was 
launched on 22 October. See 
page 20 for more. 


ee PEOPLE es | 
Digitized lives 


Lauded bioinformatician 
Jun Wang, who stepped. 
down in July from his post as 
chief executive of the world’s 
largest genome-sequencing 
organization, BGI, in 
Shenzhen, has now launched 
his own company. Wang held 
an opening ceremony for 

the firm, called iCarbonX, 

in Shenzhen on 27 October. 
He says that the artificial- 
intelligence company will 


GENDER BALANCE AND ENGINEERING 


In developed nations, many more men than women are interested in 
engineering, in stark contrast to the situation in emerging economies. 
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become a “Google for biotech” 
by collecting and analysing 
genomic, proteomic and 
other data from 1 million 
people. He plans to start 
recruiting within six months 
and to have a prototype 
platform in 3-5 years that 

will connect individual 
consumers, pharmaceutical 
companies, hospitals and other 
organizations. 


Scientist sacked 
Tsinghua University in Beijing 
confirmed to Nature on 

2 November that it dismissed 
neuroscientist Zhang Sheng- 
jia following a controversy 
over a protein that senses 
magnetism. In September, 
Zhang reported manipulating 
neurons in worms by applying 
a magnetic field to the protein. 
A researcher at neighbouring 
Peking University who claims 
to have discovered the protein's 
magnetic-sensing capability 
and was in the middle of 
publishing his own results 
complained that Zhang had 
published his paper first. 
Tsinghua University has not 
yet specified a reason for 
Zhang's dismissal. Zhang 
denies that there is anything 
wrong with his paper, questions 
the procedure that led to his 
dismissal and says that he will 
file a rebuttal. 


EPA versus VW 


The US Environmental 
Protection Agency (EPA) 

has issued a second notice 

of violation against car 
manufacturer Volkswagen 
(VW) over allegations that the 
company installed a device to 
circumvent emission standards 
in some of its vehicles. The 2 
November notice adds further 
car models to those listed on 
the notice from 18 September. 
VW previously admitted 
using ‘defeat devices’ to lower 
emissions during laboratory 
tests in some vehicles (see 
Nature http://doi.org/723; 
2015). 


> NATURE.COM 
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Military-service members can suffer brain injury and memory loss when exposed to explosions in enclosed spaces, even if they do not sustain overt physical injury. 


Memory-enhancement 
trials move into humans 


Research suggests that electrodes could compensate for damaged tissue. 


BY SARA REARDON 


strategy designed to improve memory 
A® delivering brain stimulation 

through implanted electrodes is 
undergoing trials in humans. The US mili- 
tary, which is funding the research, hopes 
that the approach might help many of the 
thousands of soldiers who have developed 
deficits to their long-term memory asa result of 
head trauma. At the Society for Neuroscience 
meeting in Chicago, Illinois, on 17-21 October, 
two teams funded by the Defense Advanced 
Research Projects Agency presented evidence 
that such implanted devices can improve a 


person’ ability to retain memories. 

By mimicking the electrical patterns that 
create and store memories, the researchers 
found that gaps caused by brain injury can 
be bridged. The findings raise hopes that a 
‘neuroprosthetic’ that automatically enhances 
flagging memory could aid not only brain- 
injured soldiers, but also people who have had 
strokes — or even those who have lost some 
power of recall through normal ageing. 

Because of the risks associated with surgically 
placing devices in the brain, both groups are 
studying people with epilepsy who already have 
implanted electrodes. The researchers can use 
these electrodes both to record brain activity 


and to stimulate specific groups of neurons. 
Although the ultimate goal is to treat traumatic 
brain injury, these people might benefit as well, 
says biological engineer Theodore Berger at the 
University of Southern California (USC) in 
Los Angeles. That is because repeated seizures 
can destroy the brain tissue needed for long- 
term-memory formation. 

Short-term memories are thought to be 
created when a part of the brain called the 
hippocampus aggregates sensory information, 
as well as the perception of space and time, and 
holds it readily accessible for a short while. 
Accessing the memory during that time will 
solidify it into a long-term memory. 
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> Key to this process is a signal that travels 
from one part of the hippocampus called CA3 
to another, called CA1. Berger and his col- 
leagues hypothesize that recreating that signal 
might restore the ability to solidify memories 
in people with damage to the hippocampus. 

In one of the studies presented at the 
Chicago meeting, researchers asked 12 peo- 
ple with epilepsy to look at pictures and then 
recall up to 90 seconds later which ones they 
had seen. While the participants did this, the 
researchers recorded the firing patterns in both 
CA3 and CAI. 

They then developed an algorithm that 
could use the activity of the CA1 cells to pre- 
dict the pattern that was coming from CA3. 
Compared with the actual patterns, their pre- 
dictions were accurate about 80% of the time. 

By using this algorithm, the researchers 
should be able to stimulate the CA1 cells with 
a pattern that mimics an appropriate CA3 sig- 
nal even if a person’s CA3 cells are damaged, 
Berger says. In previous studies on monkeys 
trained to do the picture-recall task, receiv- 
ing a juice reward when correct, his group has 
shown that stimulating CA1 with an appro- 
priate pattern significantly improved the 
animals’ performance (R. E. Hampson et al. 
J. Neural Eng. 10, 066013; 2013). 

USC biomedical engineer Dong Song, a 
member of the team, says that the group has 
tried the stimulation on a woman with epi- 
lepsy, but that it is too early to know whether 
it has improved her memory. He says that the 
researchers plan to apply it to more people 


in the coming months. Eventually, a device 
might be developed that would detect when 
the hippocampus is not efficiently encoding 
short-term into long-term memory and pro- 
vide stimulation to support the process. 

It is amazing that the memory-formation 
code can be so accurately predicted, says 
neurobiologist Howard Eichenbaum at 
Boston University in Massachusetts. But he 

cautions that mim- 


It is amazing icking it could be dif- 
that the ficult ifthe CA1 cells 
memory- are so badly dam- 
formation aged that they will 
code canbe not respond properly 
so accurately to stimulation. And 
predicted. he adds that because 


the hippocampus is 
so complex and receives inputs from many 
connections in the brain, stimulating it with 
the CA3 signal alone may not be enough. 
Thomas McHugh, a neuroscientist at the 
RIKEN Brain Science Institute in Tokyo, says 
that he has been following the team’s work for 
years and has been consistently surprised at how 
well the approach has worked in animal models. 
“The data is convincing, but I’m still at a loss for 
understanding,” he says. Many parts of the brain 
are organized in obvious ways: in the motor 
cortex, for example, stimulating a particular 
spot causes motion ina specific part of the body. 
But there is no such obvious organization in the 
hippocampus, so it is unclear why stimulating 
certain locations leads to predictable results. 
A team at the University of Pennsylvania 


(Penn) in Philadelphia is taking a different 
approach to enhancing memory that requires 
an even less detailed understanding of how the 
process works. 

The team exploits the fact that people’s 
memory skills fluctuate over time depending on 
variables such as how much caffeine they have 
consumed or whether they are under stress. 
The team has found, again by working with 
people with epilepsy, that stimulating a region 
called the medial temporal lobe, which houses 
the hippocampus, improves memory that 
is functioning poorly. But when memory is 
functioning well, stimulation impedes it. 

Ina study that they presented at the Chicago 
meeting, Penn neuroscientist Daniel Rizzuto 
and his colleagues recorded brain activity in 
28 people as they recalled a list of words. Using 
these patterns, the researchers developed an 
algorithm that predicted with high accuracy 
whether a person would remember a given 
word. By stimulating the brain only when a 
person read words that were likely to be forgot- 
ten, the researchers could boost performance 
by up to 140%. 

Penn psychologist Michael Kahana says that 
the team has recorded from the brains of about 
80 people in total and is seeking regulatory 
approval to use a more precise electrode array. 

Although it would be useful from a basic- 
science viewpoint to discover why stimulation 
works so well, McHugh says, it may be worth 
developing therapies based on it even if it is not 
fully understood — as long as it can be proved 
to be safe and effective. m 


Historic Rosetta mission to 
end with crash into comet 


There were other options, but super close-up shots on descent will provide science bonanza. 


BY ELIZABETH GIBNEY 


year since a probe called Philae made 
Ait by touching down on a comet, 

the team that pulled off the feat is 
plotting a different kind of landing. Next 
September, the European Space Agency will 
crash Philae’s mothership Rosetta into the icy 
dust ball, but as gently as possible. 

The dramatic act will bring the mission to 
an abrupt end — and give Rosetta’s wealth of 
sensors and instruments their closest view of 
the comet yet. “The crash landing gives us the 
best scientific end-of-mission that we can hope 
for,’ says Rosetta project scientist Matt Taylor. 


The collision will be emotional for the 
scientists, some of whom have worked on the 
mission since its inception in 1993. “There will 
bea lot of tears,” says Taylor. 

Launched in 2004, the Rosetta orbiter 
caught up with the comet 67P/Churyumov- 
Gerasimenko ten years later as the rock was 
travelling from deep in space towards the 
Sun — and dropped Philae onto the surface a 
few months later, on 12 November. Scientists 
have not heard from Philae since July, and don't 
know if they will do so again, but Rosetta’s 
operations to survey the comet from orbit are 
in full swing. However, the orbiter can't keep 
up this work indefinitely. Funding for the 
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mission runs out in September 2016 — and by 
that time 67P/Churyumov-Gerasimenko will 
be well on its way back out into deep space, 
where the solar-powered orbiter will receive 
too little sunlight to function. 

Discussions about what to do with Rosetta 
when that happens have continued for more 
than a year. Rosetta flight director Andrea 
Accomazzo says that, ideally, Rosetta would 
hibernate while the comet remains in deep 
space, then be resurrected when 67P again 
approaches the Sun in 4 or 5 years’ time. But 
the cold of deep space would probably damage 
the craft, Accomazzo says; others fear that fuel 
and other resources would run out. Moreover, 


SPACECRAFT: ESA/ATG MEDIALAB; COMET: ESA/ROSETTA/NAVCAM 


Artist’s impression of the Rosetta orbiter approaching the comet 67P/Churyumov-Gerasimenko. 


many of the mission's principal investigators 
(PIs) began their work more than 20 years ago 
and “there’s no point putting an old experiment 
with old Pls into hibernation’, jokes Kathrin 
Altwegg, a planetary scientist at the University 
of Bern. 

Crash-landing Rosetta emerged as the 
preferred option last year, but only now are 
orbiter navigators and operators working out 
how to go about it. Rosetta’s closest encounter 
with the comet so far was from 8 kilometres 
above the surface, when it dispatched Philae. 
The current thinking sees Rosetta spiral 
down to a similar distance next August before 
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creeping ever closer in elliptical orbits and 
crashing in September, says mission manager 
Patrick Martin — but that could still change. 
Although Philae sent back some data during 
its descent, Rosetta has more powerful — and 
more varied — sensors and instruments. The 
orbiter will also descend much more slowly than 
Philae did, allowing it to gather more data and 
better pictures. Once it gets to 4 kilometres, for 
example, Rosetta should be able to distinguish 
between the gases emerging from each of the 
duck-shaped comet’s two lobes to determine 
whether the regions vary in composition, says 
Altwegg, who leads the team behind ROSINA 


(the Rosetta Orbiter Spectrometer for Ion and 
Neutral Analysis). That could shed light on the 
environments in which each was formed. 

Rosetta’s cameras will get their best-resolu- 
tion shots of the comet's surface yet — less than 
1 centimetre per pixel once the craft is within 
500 metres of the surface, adds Holger Sierks, 
PI for Rosetta’s OSIRIS (Optical, Spectro- 
scopic, and Infrared Remote Imaging System). 
This will allow researchers to look at surface 
properties and link these to comet activity that 
Rosetta has observed from orbit. 


OVER AND OUT 

How far into the descent Rosetta will be able to 
send data back to mission control will depend 
on whether engineers can design the final tra- 
jectory such that the craft crashes on the side 
of the comet that faces Earth. Navigating while 
close to the comet will be difficult because the 
body’s gravitational field is uneven, but space- 
craft-operations manager Sylvain Lodiot hopes 
that the orbiter will transmit until the very end. 

The crash will definitely be a hard stop to the 
mission, he says, however gentle the landing. 
Designed to manoeuvre in orbit, once Rosetta 
is on the comet's surface it will no longer be 
able to point its antenna to communicate with 
Earth. Similarly, it will not be able to angle its 
solar array, so it will lose power, says Lodiot. 
“Once we touch, hit or crash, whatever you 
want to call it, it's game over.” 

Before then, though, the mission still 
has much to accomplish. As the comet 
approached the Sun, it heated up, with vapor- 
izing ice causing more and more gas and dust 
to stream from its surface. Rosetta had to 
retreat into a wider orbit to stop the dust from 
confusing its navigation system. But now that 
the comet is speeding away from the Sun, 
mission scientists are relishing the oppor- 
tunity to steer Rosetta back in. Priorities 
will then be to get images that would enable 
comparisons of the comet before and after its 
swing around the Sun, as well as a close-up of 
the southern hemisphere, which was largely 
in darkness until May and will disappear back 
out of view in March. 

Rosetta will also resume listening out for 
Philae. Given the huge public interest in any- 
thing to do with the lander, Rosetta’s finale 
will make for a fitting end to the story, adds 
Altwegg. “This way Rosetta gets to live happily 
ever after on the comet with Philae.” m 
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Raging forest fires threaten orangutans such as this one at a rehabilitation centre in Borneo. 


CONSERVATION 


Scramble to save 
Borneo’s orangutans 


Fuelled by El Nino and land-management blunders, 
Indonesian fires are consuming precious habitat. 


BY NADIA DRAKE 


r | Vhe world’s only wild orangutans — 
already besieged by logging, hunting, 
pet trading and the steady expansion 

of palm-oil plantations — are now threatened 

by forest fires that have burned for months on 
the islands of Borneo and Sumatra in southeast 

Asia. In the toxic smoke and haze, locals and 

researchers are scrambling to protect the esti- 

mated 50,000 remaining orangutans that live 
only on those two islands. 

Fires erupt every year in Indonesia during 
the dry season, as farmers, plantation own- 
ers and others deliberately burn forest to clear 
land or to settle territorial disputes. But this 
year’s El Nifio weather pattern, combined with 
a legacy of land-management practices that 
have dried the soil and degraded vast swathes 
of peat-swamp forest, turned this burning 
season into an environmental catastrophe 
that has destroyed more than 2 million hec- 
tares of forest throughout Indonesia, to which 


Sumatra and much of Borneo belong. 

Since late summer, teams of researchers 
have headed out from the city of Palangkaraya 
in Borneo to find and fight new blazes. Some 
patrol the rivers and others head into the 
forest, where extinguishing the flames can 
require drilling more than 20 metres down to 
reach the water table — tough, gruelling work 
that is carried out amid tropical heat and in a 
persistent, menacing orange haze. 

One day in October, Simon Husson, 
director of the UK-based Orangutan Tropi- 
cal Peatland Project, deployed a drone at the 
Borneo Orangutan Survival Foundation’s 
centre for orangutan rescue and rehabilitation 
near Palangkaraya. “Eyes in the sky are a huge 
help,’ he says. “On the ground, you're in chok- 
ing smoke and the haze is severely restricting 
visibility,” 

As the drone rose above the smoggy blanket, 
its camera glimpsed a new fire burning deep in 
the forest. The fire was remote enough not to 
threaten the orphaned and injured orangutans 
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being readied for reintroduction to the forest, 
“but you can't help thinking about the wild 
ones out there’, Husson says. 

Husson and his colleagues have temporarily 
abandoned their normal research activities in 
the 6,000-square-kilometre Sabangau Forest, 
which is home not just to orangutans but also to 
rare Bornean white-bearded gibbons, sun bears 
and pangolins, to help local fire-fighting teams 
with cash and personnel. “Not only is [research] 
pretty unimportant right now; he says, “it’s basi- 
cally impossible to study the orangutans in the 
canopy as we can't see them for the smoke.” 

Peat fires devastate orangutan populations 
primarily by destroying crucial habitat, but 
the animals are also susceptible to the same 
types of smoke- and haze-induced respiratory 
problems as humans. The charismatic arboreal 
apes are already endangered throughout their 
range; their population is estimated to have 
declined by 78% from more than 230,000 a 
century ago. “Over half the world’s orangutans 
live in peat-swamp forests, and every one of 
these peatlands in Borneo right now is on fire, 
somewhere,’ Husson says. 

Undisturbed peat forests are actually 
incredibly fire resistant, says Susan Page, a 
geographer at the University of Leicester, 
UK, who studies peatlands in southeast Asia, 
because the swamps are damp enough to make 
ignition difficult. But, unfortunately, large 
tracts of Borneos peatland are anything but 
undisturbed. In 1996, Indonesia's then-presi- 
dent Suharto launched the Mega Rice Project, 
which tried to transform 1 million hectares of 
Bornean peatland into rice paddies. Draining 
the peat was essential for the plan, and despite 
the fact that no rice was ever harvested, canals 
that were cut through the forests have been 
draining water from the peat ever since. 

The infernos in Indonesia have climate 
implications as well. Normally, Borneo’s peat 
forests are efficient carbon stores, holding 
tonnes of organic matter in layers of com- 
pressed plant material that can be more than 
15 metres thick. But when that peat burns, 
the accumulated carbon is released. This 
year, the fires have already released more 
than 1.5 billion tonnes of carbon dioxide into 
the atmosphere — more than Japan’s annual 
carbon emissions. Since September, carbon 
emissions due to the fires have exceeded the 
daily production of the United States on at least 
38 days, prompting one conservation scientist 
to call this year’s fires the “biggest environmen- 
tal crime of the twenty-first century”. 

The situation is unlikely to get better without 
an extended period of rain or a serious com- 
mitment from the Indonesian government. If 
the El Nifo-driven drought persists, as some 
climate models predict, this year’s fire season 
could last well into 2016. 

“Severe fires did not occur before there 
was intensive land-use development,” Page 
says. “Solutions will require strong political 
leadership and investment.” = 
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Tech investors bet on 
synthetic biology 


Once hesitant, Silicon Valley venture capitalists are warming to the idea of engineered cells. 


BY ERIKA CHECK HAYDEN 


money to start Twist Bioscience, a com- 

pany that aimed to synthesize DNA more 
quickly and more cheaply than existing meth- 
ods allowed. But many investors were spooked 
by the perception that synthetic biology — the 
engineering of microorganisms to make useful 
products such as drugs, food ingredients and 
materials — would not turn a profit. “It was a 
lonely time,” Leproust recalls. 

Three years later, Silicon Valley’s big fish — 
technology investors with billions of dollars at 
their disposal — have finally ventured into syn- 
thetic biology’s small pond. Scared away from 
conventional biotechnology in past by the risky 
and expensive prospect of drug development, 
they are now lured by what they see as synthetic 
biology’s huge market potential, plummeting 
operating costs, improved business models and 
an increasing emphasis on computing. 

“The toolkit is evolved from where it was two 
years ago; synthetic biology is going through a 
digitization and automation,” says Nan Li, prin- 
cipal at investment firm Obvious Ventures in 
San Francisco, California. Li has coined the 
term ‘biobiotics’ for the current state of syn- 
thetic biology. “We see that this looks alot more 
like a data and software problem, and we can 
understand that and get excited about that.’ 

Software tools and robotics have reduced the 
cost of all parts of the process, from creating a 
genetic ‘program to inserting it into a microbe 
and testing it in the lab. For instance, synthetic- 
biology start-up firm 
Zymergen in Emery- 


lE 2012, Emily Leproust was trying to raise 


(t7 e 
ville, California, uses Beta de . 
machine learning tology is goms 
— computer algo- through a 
rithms that evolve in digitization and 


response to data — to 


automation.” 


guide the engineering 
of fungi and bacteria that perform industrial 
processes more efficiently. The company also 
depends heavily on robotic automation of its 
labs, reducing the need to pay human workers. 
Zymergen and Twist Bioscience are among a 
global class of synthetic-biology firms that have 
raised a record-breaking US$560.7 million this 
year, including $227 million for companies that 
use the wildly popular CRISPR/Cas9 gene- 
editing technology, says John Cumbers, founder 


MONEY FOR MICROBES 


Investments in synthetic-biology start-ups have increased dramatically in the past three years. Much of the 


funding comes from prominent technology investors. 


COMPANY YEAR FOUNDED | BUSINESS 
Twist 2013 DNA synthesis 
Bioscience 
Zymergen 2013 Microbial-strain 
optimization 
Ginkgo 2008 Microbial 
Bioworks engineering 
Bolt Threads | 2009 High- 
performance 
fibres 
Transcriptic 2012 Robotics for 
biology labs 
Riffyn 2014 Software 
Emerald 2010 Technology 
Therapeutics platforms 


of SynBioBeta industry group (see ‘Money for 
microbes’). Overall, 24 newly created synthetic- 
biology companies have raised funding in 2015, 
compared to fewer than 6 in 2012. 

Much of the work that goes into developing 
a synthetic-biology product is shifting to com- 
puters and robots. This means that as the sector 
grows, every dollar invested can produce more 
progress. “That reduces the cost for the venture 
capitalist who is having to put up the money,’ 
says Matt Ocko, co-managing partner of the 
Data Collective fund and one of several inves- 
tors who will speak at the SynBioBeta industry 
conference in San Francisco on 4—6 November. 


PAST MISTAKES 
The new generation of synthetic-biology 
companies also has the advantage of experience. 
Many of today’s founders are alumni of firms 
that struggled because they set their sights on 
huge, highly regulated industries that were hard 
to break into — such as pharmaceuticals or fuel. 

Now, by contrast, start-up companies are 
focusing on niche areas where they can quickly 
bring products to market, such as speciality 
chemical, food, cosmetics and clothing indus- 
tries — while hoping that opportunities will 
emerge to tackle other, bigger targets. 

“People have come up with much more clever 
ways of generating revenue much faster,’ says 
Derek Greenfield, a co-founder of Industrial 


TOTAL FUNDS (US$) | NOTABLE INVESTORS 

$82.11 million Yuri Milner (Internet- 
company investor) 

$44 million Obvious Ventures; Eric 
Schmidt (Alphabet 
executive chairman) 

$54.12 million Matt Ocko (Facebook and 
Zynga investor) 

$40 million Peter Thiel and Max 
Levchin (PayPal 
co-founders) 

$14.37 million Jerry Yang (Yahoo 
co-founder) 

$1.8 million O'Reilly AlphaTech 
Ventures 

$34 million Peter Thiel and Max 
Levchin 


Microbes, which aims to engineer yeast that 
can synthesize chemicals using methane as a 
raw material. Bolt Threads in Emeryville, for 
example, is making fabrics from yeast by engi- 
neering the cells’ metabolic pathways to mimic 
the processes used by spiders to make silk. 

Other companies aim to use microorgan- 
isms to make rubber, egg proteins, rhino horn, 
vanilla flavouring, rose-scented extract or cof- 
fee more cheaply, more ethically or of higher 
quality. For example, coffee fermented by engi- 
neered microbes could replace a high-end brew 
made with beans harvested from civet faeces 
under conditions that some consider inhumane. 

Long term, synthetic-biology businesses have 
the potential to achieve something more mean- 
ingful — and profitable — than another social- 
media site or cloud-storage service, says Jason 
Kelly, co-founder of microbial-engineering 
company Ginkgo Bioworks in Boston, Massa- 
chusetts. Like electric-car maker Tesla or com- 
mercial spacecraft company SpaceX, he says, 
synthetic biology could revolutionize economic 
sectors that are ripe for innovation — or even 
create new industries. 

Li concurs. “As we understand more about 
the biology and limitations of these microbes, 
there’s a potential to create entire new product 
categories,” he says. “A lot of limitations that 
we take for granted can be stretched or pushed; 
there’s just a lot more levers to pull.” m 
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| INDIA | 

Scientists decry 
killings of 
secularists 


Indian academy members 
condemn intolerance. 


BY T. V. PADMA 


religious intolerance and the killings of 

three noted advocates of rational thinking. 
The actions are unusual in a country where sci- 
entists rarely comment on political issues, says 
physicist Shri Krishna Joshi, a member of India’s 
Inter-Academy Panel on Ethics in Science. 

Anti-superstition activist Narendra 
Dabholkar was killed in 2013, communist 
politician Govind Pansare in February this year 
and literature scholar Malleshappa Kalburgi in 
August. All three deaths have been blamed on 
members of extreme right-wing Hindu groups. 

On 22 October, scientists launched an 
online petition to India’s president, Pranab 
Mukherjee, protesting against the killings. “The 
government has failed to check or discourage 
the anti-rational environment,’ says petition 
signatory Naresh Dadhich, a physicist at the 
Inter-University Centre for Astronomy and 
Astrophysics in Pune, India. 

The petition was followed on 27 October by 
a statement from the Inter-Academy Panel on 
Ethics in Science, set up by the Indian National 
Science Academy in New Delhi; the Indian 
Academy of Sciences, Bangalore; and the 
National Academy of Sciences in Allahabad. 
The Indian constitution mandates that “its citi- 
zens abide by and uphold reason and scientific 
temper’, the statement said. Several statements 
and actions “run counter to this constitutional 
requirement,’ it notes. 

Indira Nath, a member of the panel and an 
immunologist at the Indian National Science 
Academy, says that the panel wants to “bring 
back rationality and scientific thinking to the 
mainstream”. 

Also last week, more than 100 scientists 
from leading Indian institutes, including 
national award winners, three fellows of the 
Royal Society in London, and a foreign asso- 
ciate of the US National Academy of Sciences, 
signed a second statement expressing deep 
concern over the “climate of intolerance”. 

Pushpa Mittra Bhargava, former director 
of the Centre for Cellular and Molecular Biol- 
ogy in Hyderabad, says that he plans to return 
a national award in protest. “Science is about 
reason and rationality. If three rationalists can 
be killed, scientists too can be killed.” = 


| ndian scientists are voicing concerns over 


20 | NATURE | VOL 527 | 5 NOVEMBER 20 


CRYSTAL CHALLENGE 


The 3D structure that a molecule adopts in a crystal is very difficult 
to predict — but defines what properties the molecule has. 


The structural formula of 
a molecule reveals which 
atoms are connected at a 
2D level. 


CHEMISTRY 


Chemists are making progress at 
predicting how complex molecules 
will assemble in 3D space — there 
are millions of possibilities. 


The 3D orientation repeats in a 
crystalline lattice with a structure that 
dictates the molecule’s mechanical, 
chemical and physical properties. 


Software predicts 
crystal structures 


Chemists have succeeded at a fiendish task — forecasting 
how complex molecules will assemble in 3D. 


BY ELIZABETH GIBNEY 


ecule on a napkin and it may not be 

apparent that there are millions of possi- 
ble ways that it could assemble as a 3D crystal. 
Now, a collaboration of dozens of chemists 
and computer programmers has successfully 
predicted the crystal structure of five, com- 
plex, ‘drug-like’ organic molecules — using 
nothing but a 2D map showing which atoms 
connect to which. 

The achievement, announced on 27 Octo- 
ber at a workshop in Cambridge, UK, paves 
the way for software that would cut the cost 
of the design and manufacture of drugs and 
other chemical products, as well as further 
our understanding of fundamental chemistry. 

A molecule’ crystal structure determines 
its properties (see ‘Crystal challenge’). In 
1998, the US pharmaceutical firm Abbott 
Laboratories learned this the hard way when 
it had to pull production of the capsule form 
of the HIV treatment ritonavir because the 
drug had started switching to an unexpected 
structure during manufacture. The crystal 
structure that a molecule adopts is generally 
the one with the lowest energy, but predict- 
ing what this is for any particular molecule 
is “fiendishly difficult’, says Colin Groom, 
executive director of the Cambridge Crystal- 
lographic Data Centre (CCDC). 


ketch the structure of an organic mol- 
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Even when chemists know which atoms 
are connected to which, the atoms can still be 
in different orientations because the bonds 
that connect them can bend and rotate in 
myriad ways. There are also multiple options 
for how molecules can pack together. “It is 
like looking for a needle in an unimaginably 
big haystack,’ says Anthony Reilly, a struc- 
tural chemist at the CCDC. 

Since 1999, the CCDC has organized 
six challenges known as the Blind Test of 
Organic Crystal Structure Prediction Meth- 
ods. Rather than a contest, organizers see the 
challenge as a large collaborative attempt to 
compare the strengths of the latest tech- 
niques. “The groups participating represent 
pretty much the entire crystal-structure pre- 
diction community, and the methods used 
are the very best developed,’ says Groom. 

The challenge typically takes place over 
a year, and sets two major problems. First, 
teams must come up with a list of all pos- 
sible arrangements in which the molecules 
could forma crystal. Some teams do a rough 
calculation of the energy of each to whittle 
down the list, burning up hundreds of thou- 
sands of hours of computing time; others 
start with pure guesses and iteratively ‘breed’ 
the most stable to derive possible candidates 
more quickly. In the second stage, teams 
take the shortlists — sometimes assembled 
by a different group — and do more-precise 


calculations of the energy of each, producing 
a ranking of the candidates. 

The latest challenge, which included a 
record 25 teams — ten more than the pre- 
vious contest in 2010 — brought a “massive 
improvement’, says Groom. The molecules 
selected were “nasty, real-life systems” of the 
size and complexity to be interesting to drug 
companies. Previous challenges had included 
molecules that were flexible or made from 
multiple parts. This year’s challenge combined 
such features in the same molecules and for 
one target, asked participants to predict not 
just one stable structure, but all its many stable 
forms, known as polymorphs. 


PROBLEM SOLVED 
The teams rose to the challenge: at the 
Cambridge workshop, the CCDC announced 
that each of the five targets, and their poly- 
morphs, appeared in at least one of the short- 
lists produced by the various methods. A paper 
with the full results will be published in a spe- 
cial issue of Acta Crystallographica Section B. 
Moreover, one team, led by Marcus 
Neumann at the German company Avant- 
garde Materials Simulation in Freiburg, 
included the correct solution in each of its 
shortlists. Had the team combined its efforts 
with those of a group — led by theoretical 
chemist Alexandre Tkatchenko at the Fritz 


Haber Institute in Berlin — that got a perfect 
score in the ranking phase, the two would 
together have achieved a perfect score for both 
rounds and across all targets. Such a result 
has never occurred in the history of the con- 
test. “With what you have seen from me, and 

what you have seen 


“Wehavefinally from Tkatchenko,”” 
kicked the says Neumann, “it is 
user out of the fair to claim that to 


equation.” a large extent, this 
blind test has shown 
that the problem of organic crystal structure 
prediction has been solved.” 

More so than in previous blind tests, teams 
including both Neumann’s and Tkatchenko’s 
took into account how quantum mechanical 
interactions would contribute to the energy 
of structures. In particular, Tkatchenko used 
a method published just last year that encom- 
passed these interactions over longer ranges than 
has been done previously. And Neumann says 
that his program was unique because it made 
every decision by itself; most others required 
human decisions once the computer had 
returned its calculations. “We have finally kicked 
the user out of the equation,” Neumann says. 

Although others agree that the joint feat is 
a milestone, they stop short of declaring the 
problem of crystal structure prediction solved. 
“This does not mean that they would have 
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cracked the problem of predicting all organic 
crystal structures,” says Sally Price, a theoreti- 
cal chemist at University College London. 

And some are frustrated that Neumann has 
refused to release his computer code: “The 
day I have a pension plan, I will talk about 
this freely,” he told the workshop. That will 
make it hard for others to build on his team’s 
breakthrough. “We don't really have a sense 
of how it works,” says challenge participant 
Claire Adjiman, a chemical engineer at Impe- 
rial College London. “But I understand why he 
doesn't tell us more.” 

Tkatchenko and Neumann now plan to 
work together. “My own interest is to under- 
stand polymorphism and be able to offer tools 
to people,’ says Tkatchenko. “His interest is 
more commercial, but I’m sure we can find 
the middle ground” 

Both Price and Neumann, meanwhile, 
are already working with industry on how 
to use their prediction calculations in drug 
development. m= 


CORRECTION 

The News story ‘Vaccine gets cautious 
boost’ (Nature 526, 617-618; 2015) 
incorrectly stated that David Kaslow was 
involved in the early development of RTS,S. 
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THE BA BY’ 
EXPERIMENT 


BY LINDA GEDDES 


WES FERNANDES/NATURE 


A London lab is deploying every technology it can to understand 
infant brains, and what happens when development goes awry. 


aby Ezra is sitting on his mother’s lap and staring at the computer screen with the amazement of 
someone still new to the world. The five-month-old’s eyes rest on a series of pictures: three danc- 
ing women, four black circles, then a face among random objects. Ezra studies the screen with 
fascination — although now and then, his attention wanders. He lets out a gurgle, and moments 
later, a short cry. He is chewing a sock. 
Below the screen, a box is shining infrared light at his cornea, and then capturing and processing 
the reflected light to work out the direction of his gaze. Behind a curtain, postdoc Jannath Begum 
Ali checks the data streaming in on her monitor. This set-up is part ofa sophisticated experiment 
to understand the early development of the human mind in the Babylab at Birkbeck, University 
of London. The scientists here will closely monitor Ezra’s brain and behaviour at visits over the At Babylab, a 
next two and a half years. 6-month-old 
Oblivious to his important role in science, Ezra furrows his brow into a frown. What happens _ has her brain’s 
next is apparent only to his mother, who turns him around and checks his behind. With just halfofa __ electrical activity 
planned 15-minute observation complete, Ezra has defecated. At that point, everyone takes abreak. _ monitored. 
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How do you get into the mind ofa human being who cannot speak, 
does not follow instructions and rudely interrupts your experiments? 
That is the challenge embraced by scientists at the Babylab. The brain 
undergoes more change during the first two years of life than at any 
other time: consciousness, traits of personality, temperament and abil- 
ity all become apparent, as do the first signs that development could be 
drifting off course. But this period is also the most difficult to explore, 
because many of the standard tools of human neuroscience are useless: 
babies will not lie awake and still in an imaging machine, and they can- 
not answer questions or do as they are told. Researchers have measured 
infants’ interest and attention mostly by tracking their gaze — but even 
this method has been criticized as crude. 

“There are many studies where someone tries to prove that 
the baby understands goals, causality, number — and in 99% of 
those studies the only measure they look at is a change in looking 
time,” says Jerome Kagan, a psycholo- 
gist at Harvard University in Cambridge, 
Massachusetts. 

The field is now becoming more sophis- 
ticated, thanks in part to the Birkbeck lab. 
Scientists there have pioneered techniques 
such as infant near-infrared spectrometry 
(NIRS), which measures brain activity by 
recording the colour, and therefore the oxy- 
genation, of blood. They are also trying to 
strengthen conclusions by combining multi- 
ple techniques. Among the handful of baby labs around the world, this 
makes the London one stand out. “They are doing research on babies 
using every single technique you could imagine,’ says Richard Aslin, an 
infant-behaviour researcher and director of the Rochester Center for 
Brain Imaging in New York. 

The lab has used such tools to reveal a series of ‘firsts’ about the infant 
mind: that babies prefer to look at faces that are looking directly at them, 
rather than away from them; that they respond to such direct gaze with 
enhanced neural processing’; and that changes in this brain response 
may be associated with the later emergence of autism — the first evi- 
dence that a measure of brain function might be used to predict the 
condition’. In 2013, the Babylab started the flagship project of which 
Ezra is part: an effort to study infants from 12 weeks old who are at 
high risk of autism spectrum disorder or attention deficit hyperactivity 
disorder (ADHD), alongside a control group, in order to detect more 
early signs of these conditions and find behavioural therapies that might 
help. “It’s an exciting, and emerging, field? says Mark Johnson, director 
of the Babylab. 

And, like its subjects, the London lab is growing up. In 2014, Johnson 
received £2.3 million (US$3.5 million) from a trio of foundations to 
establish a toddler lab at Birkbeck, in which children aged 18 months to 
3 or 4 years old will be attached to wireless forms of electroencephalog- 
raphy (EEG), NIRS and eye-tracking technology as they walk around, 
play and interact with other children. The aim is to understand the 
brain during toddlerhood, the time when children start to appreciate the 
difference between self and other, complex language develops and long- 
term memories are first laid down. “In child development in general, 
but also in our brain-development work, the terrible twos are a major 
black hole,’ Johnson says. 


LOOK AND LEARN 


There is a well-worn adage in show business that you should never 
work with children or animals. Johnson built his career doing both. For 
his PhD project in the 1980s, he investigated whether day-old chicks 
formed social attachments to any object placed in their pen, or if they 
preferred ones that resembled a mother hen. (The chicks were particu- 
larly drawn to objects with hen-like necks and faces, but weren't too 
fussy about the rest of their looks*.) But Johnson was more interested 
in human development, so after his PhD he took a research-scientist 
position in London to begin studying infants. “In some ways that’s not as 
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big a jump as it sounds,” he says. “In both cases you're trying to develop 
tasks and get information from non-verbal creatures.” 

Scientists have been attempting practical research with babies since 
the middle of the twentieth century. One of the first to do so was Jean 
Piaget, a Swiss psychologist who used detailed observations of infants 
and older children to gain insight into how they understand the world — 
including, famously, by hiding an object to see whether infants try to 
find it. He concluded that babies cannot grasp the concept that an object 
still exists when it is out of sight until they are around eight months old. 
Piaget went on to develop the theory that babies are essentially born as 
blank slates, but possess the machinery that motivates them to explore 
the world and allows them to assimilate knowledge. 

Infant neuroscience leapt forward in the early 1960s, when the 
US developmental psychologist Robert Fantz started measuring the 
amount of time babies spent looking at something as a way to gauge 


“They are doing research on 
babies using every single 
technique you could imagine.” 


how interested in it they were. Fantz reported that a two-month-old 
baby spent twice as long looking at a sketch of the human face as at a 
bullseye, for instance. Experiments based on gaze measurements have 
been the field’s workhorse ever since. “It is no exaggeration to say that 
without looking-time measures, we would know very little about nearly 
any aspect of infant development,” says Aslin. Gaze experiments have 
led some researchers to conclude that, far from being blank slates, babies 
are born with an innate appreciation of number and human faces, as well 
as the ability to recognize when their mother’s native language is being 
spoken — a familiarity proposed to develop through hearing speech 
while in the womb. 

“There have been literally thousands of experiments done with these 
looking-time methods,” Aslin says, “and by and large it is a pretty reli- 
able technique; you can have two labs running the same experiment 
and you get the same results.” But Aslin and Kagan are two ofa growing 
number of researchers who think that such infant studies should be 
viewed with caution: it can be dangerous to infer too much about the 
workings of a baby’s mind from just their fleeting glance — and they 
worry that some labs do not control for confounding factors as well as 
they should. “Looking time is under the control of so many conditions,’ 
Kagan says. “What are the physical features of the stimulus? Are its 
lines mainly curved or straight? What colours are present? How much 
contrast in lighting is there?” 

Babies’ brains are growing and developing at an extraordinary pace, 
which makes comparisons between different ages difficult: a newborn's 
gaze might reflect innate abilities, but a seven-month-old’s will also be 
influenced by what he or she is starting to learn and remember about the 
world. “An infant may look longer in order to relate the event to what it 
already knows,” says Kagan. “The main point is that no single measure 
is able to supply all the evidence required for conclusions about what 
infants know.” 

That was the opinion that Johnson quickly reached when he began 
infant research: the reliance on looking time and observations alone 
were unsatisfying. He established a baby lab at University College Lon- 
don (UCL) in 1993, and it moved to more spacious premises at Birk- 
beck in 1998. From the start, Johnson wanted to take a more high-tech 
approach to investigating brain development than were the handful of 
other similar labs. 

In 2005, Johnson and his colleagues combined observations of 
looking time with electrical measurements of brain activity to investigate 
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Piaget's claim that infants younger than nine months do not understand 
the permanence of an object that has vanished. When adults view an 
object disappearing, they tend to show an increase in a particular type 
of neural oscillation over the right temporal cortex. Johnson, working 
with colleagues Gergely Csibra and Jordy Kaufman, showed that six- 
month-old babies show a similar pattern — suggesting that they do keep 
hidden objects in mind. The same pattern was not observed when the 
object disintegrated instead of being hidden’. 

Studies such as these have convinced Johnson that babies are not born 
blank slates, but neither do they possess adult-like concepts about things 
like number. “My work, I think, goes for a middle ground,’ he says. He 
argues that the newborn has basic attention preferences for things such 
as faces and speech, and that these preferences shape the brain as it devel- 
ops’. Johnson’s observation that young babies prefer direct eye contact is 
one such example; this sets them up to focus on socially relevant parts of 
their surroundings, which in turn enables them to learn about language 
and other social cues such as facial expressions. 


HAPPY BABY 


Working with babies requires specialized kit — particularly for a 
laboratory that can see as many as 14 ina day. The Babylab kitchen hosts 
a bottle-warmer, and bathrooms are well stocked with wet-wipes. The 
waiting room is brightly decorated and scattered with easy-to-clean toys. 
The laboratories, however, are largely empty and painted a dull battle- 
ship grey — a deliberate choice, because babies are easily distracted. “We 
try to make it as boring as possible, except for the thing we need them 
to focus on,’ says Leslie Tucker, coordinator of the Centre for Brain and 
Cognitive Development, of which the Babylab is part. 

Hungry or tired babies do not make for good experiments, so every- 
thing is carefully planned around meals and naps. In the waiting room, 
Caitlin — a four-month-old in stripy blue dungarees — is receiving a 
last-minute breastfeed before being ushered into a lab. She is partici- 
pating in a study to assess the development of mimicry in babies: the 
unconscious tendency of people to frown when someone else frowns, 
or smile when they smile. 

“Mimicry serves important social functions in adults and has even 
been suggested to be the ‘social glue’ that binds us together,’ says Carina 
de Klerk, who is leading that study at Birkbeck. But very little is known 
about how, and when, it develops. Some researchers think that it is 
something babies are born with — newborns have been observed to 
stick their tongues out in response to an adult doing the same’. But 
“t's not clear if the baby is actually copying, or perhaps they just stick 
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out their tongue whenever something exciting happens’, de Klerk says. 

She sings to baby Caitlin while sticking electrodes on her temples, 
cheeks and under her chin. The baby seems unsure, so a research assis- 
tant appears, brandishing a garish musical telephone. The art of dis- 
traction is a fundamental skill that anyone working in a baby lab must 
quickly master. “Researchers from other fields come down here and 
are often horrified at the lack of controls,” says Tucker. “You're going to 
interrupt the experiment if you have to, or make noises to distract them 
if they look like they’re going to cry:” 

It works: Caitlin is now cooing and smiling. The researchers pause for 
a moment, while Caitlin’s mother takes a photo of her “science baby” on 
her phone. Then Caitlin is shown a series of video sequences of a woman 
raising her eyebrows or opening and closing her mouth, interspersed 
with static pictures of farm animals. 

The mimicry experiment is a prime example of the Babylab’s mixed- 
methods approach. Baby Caitlin stares intently at the screen; she does 
not seem to be copying the woman's actions. But the electrodes on her 
face may tell a different story: the technique, called electromyography 
(EMG), picks up electrical activity in her facial muscles, which will 
indicate if Caitlin is activating her eyebrow area — even if she is not 
overtly moving it — in response to the woman raising hers. Later in 
the day, Caitlin is shown the same video sequence while hooked up 
to NIRS. 

NIRS is transforming the ability of researchers to peer into the minds 
of babies. It was originally adopted by medical physicists at UCL as a 
technique to help predict the risk of stroke in premature babies. They 
then began working with Birkbeck researchers to adapt it to answer 
more fundamental questions’. By tracking the flow of oxygenated blood, 
NIRS allows scientists to see which brain areas become more active in 
response to external events. For instance, a 2009 study from the Babylab 
revealed that the brains of five-month-olds already show an adult-like 
pattern of activation in response to social stimuli, such as a woman 
playing peek-a-boo with them®. In the mimicry study, the researchers 
want to see if the babies’ brains show a similar pattern to those of adults 
who are mimicking others, which should help to explain if mimicry is 
partly innate. 

But NIRS is not perfect, in part because it cannot measure what is 
happening in important inner brain regions such as the hippocampus 
or the amygdala. “The brain is a complex connected circuit. If you only 
measure a superficial part of that circuit, you can come to the wrong 
conclusions,” Kagan says. To assess these deeper areas, researchers need 
a technique such as functional magnetic resonance imaging (fMRI), 
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“Youre going 

to interrupt the 
experiment if you have 
to, or make noises to 
distract them.” 


which has yielded huge insight into the adult brain. But {MRI is highly 
sensitive to movement, so babies can be scanned only if they are sedated 
or asleep, which has severely limited the technique’s use. 


AN EYE ON AUTISM 


Looking time remains an important tool at Birkbeck and elsewhere — 
although these days, it is assessed not by human observation but by pre- 
cise eye-tracking technology, such as that being used on baby Ezra. Ezra 
is a control for the autism and ADHD study: he does not have an older 
sibling with one of the disorders, so is not considered at high risk. As his 
attention flits between the apparently random objects on the screen, the 
reflected infrared light allows psychologist Emily Jones — who directs 
the project — to gauge precisely what he is looking at, and in which order. 
“What we tend to find is that typically developing babies will always look 
first, and longer, at the face, before looking at the other objects,” she says. 

Autism and ADHD have become a major focus of the Babylab as the 
prevalence and awareness of the conditions have risen in the past two 
decades — they are now believed to affect around 4% of the UK popula- 
tion. Last year, in a study of 104 infants, the Birkbeck team showed that 
infants at high risk of autism were drawn towards the face first, but they 
seemed to spend less time overall than ‘neurotypical’ babies in looking 
at any of the objects — and those that went on to develop autism had the 
shortest looking time of all’. A separate eye-tracking study published by 
the group earlier this year revealed that nine-month-olds who went on 
to develop symptoms of autism were more likely to spot the odd-one- 
out among a group of letters ona screen”. 

It is not completely clear why this is, but the working hypothesis is 
that these infants are more attentive to the details of what they see, says 
Teodora Gliga, who led the odd-one-out study. The downside of this 
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Not your average lab: the Babylab (left) is designed for infants; a row of EEG 
‘hairnets’ (middle); and an eye-tracking experiment under way (right). 


could be that children who go on to develop autism find it harder to draw 
general conclusions about what they are seeing, she says. The study of 
which Ezra is part aims to extend this work by collecting more-detailed 
measures from over 400 families — and to identify those features that 
are strongly associated with the later onset of a developmental disorder. 
During the five visits that Ezra will make to the Babylab as he grows up, 
he will be tested using EEG, NIRS and EMG, and his parents will be given 
extensive questionnaires to assess his language skills, social development, 
temperament and sleeping patterns. 

The team hopes that early brain differences could some day provide 
indicators — or biomarkers — of autism, which isn’t usually diagnosed 
until close to a child’s third birthday. They also hope to find ways to steer 
brain development back towards a more typical course. 

One clinical trial at the Babylab already suggests that early interven- 
tion can have an effect. Babies in 28 families with an older sibling with 
autism were randomly assigned to a group in which they were visited by 
a therapist at least six times between the ages of seven and ten months, 
and were compared with a group of high-risk babies who received no 
therapy. The therapist showed parents videos of them interacting with 
their child to help understand how their baby was trying to communi- 
cate with them, and how to respond. After five months, the team saw 
hints of improvements in the babies’ engagement, attention and social 
behaviour, compared with controls. But the team acknowledged that 
many of the results had wide confidence intervals and that it is too early 
to say whether the intervention will have long-term effects”. 

Johnson hopes that investigations in the toddler lab, when they start, 
might also eventually find a practical use, helping researchers to devise 
ways to boost cognitive, attention and memory skills. “I believe we are 
now at a unique point of convergence between this basic science and 
the clinical science,” he says. 

Meanwhile, the techniques continue to evolve. Jones is currently 
piloting ‘gaze-contingent’ tasks, which enable babies to become active 
participants in experiments. “If they can focus their attention on a 
butterfly flying across the screen, and not get distracted by other things 
that are happening, then the butterfly keeps flying, so they get rewarded 
for controlling their attention,’ Jones says. A more distant goal is to 
develop ways of using fMRI so that it could be used on awake babies. 
And there are still so many questions that demand answers. How do 
differences in the temperaments of babies develop into more complex 
personality traits as children age? And why can’t people remember their 
earliest months and years? 

Baby Ezra will certainly not remember his day in the lab. By late 
afternoon, his mother is tucking him into the pushchair for his journey 
home — a 1-hour 45-minute journey to Bristol by train. The trip was 
worth it, she says, because she was curious to learn what goes on at the 
Babylab. “I was interested in how Ezra would respond, but also in why 
those tasks were being done,’ she says. 

Ezra and his mother now have souvenirs of their day: some photos, 
a certificate of participation and a baby-sized T-shirt. “I'm an infant 
scientist) it reads. m 


Linda Geddes is a freelance writer based in Bristol, UK. 
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AND THEN THERE 


WERE NONE 


Seven centuries ago, tens of thousands of people 
mysteriously fled their homes in the American Southwest. 
Archaeologists are trying to work out why. 


BY RICHARD MONASTERSKY 


ultures carve lazy circles in the sky as 
a stream of tourists marches down 
a walkway into Colorado's Spruce 
Canyon. Watching their steps, the 
visitors file along a series of switchbacks leading 
to one of the more improbable villages in North 
America — a warren of living quarters, storage 
rooms, defensive towers and ceremonial spaces 
all tucked into a large cleft in the face ofa cliff. 


When ancient farmers built these structures 
around the year 1200, they had nothing like the 
modern machinery that constructed the tour- 
ist walkway. Instead, the residents had to haul 
thousands of tonnes of sandstone blocks, cut 
timber and other materials down precarious 
paths to build the settlement, known as Spruce 
Tree House, in Mesa Verde National Park. 

“Why would people live here? That’s an 
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important question. It’s not an easy place to 
reach,’ says Donna Glowacki, an archaeologist 
now at the University of Notre Dame in Indi- 
ana, as she walks among the ruins. Even more 
perplexing is what happened after they settled 
there. The villagers occupied their cliffside 
houses for just a short time before everyone 
suddenly picked up and left. So did all the 
other farmers living in the Four Corners region 
of the American Southwest, where the mod- 
ern states of Colorado, New Mexico, Utah and 
Arizona meet (see ‘Turbulent times’). 

All together, nearly 30,000 people disap- 
peared from this area between the mid-1200s 
and 1285, making it one of the greatest vanish- 
ing acts documented in human history. What 
had been one of the most populous parts of 
North America became almost instantly a 
ghost land. 

Archaeologists have long puzzled over what 
drove these farmers, the ancestors of the Pueblo 
people, from their homes and fields. “That is 
one of the iconic problems of southwestern 
— and world — prehistory,’ says archaeolo- 
gist Mark Varien, who 
is executive vice-presi- 
dent of the Crow Can- 
yon Research Institute 
in Cortez, Colorado. 
Early scholars blamed 


Cliff Palace, a Pueblo 
dwelling in Mesa 
Verde National Park, 
was a thriving village 
in the 1200s. 
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SOURCE: S. ORTMAN J. ARCHAEOL. METH. THEOR. HTTP://DOI.ORG/8T6 (2014). 


nomads, the ancestors of the Apache and 
Navajo, for violently displacing the farmers. 
Over the past couple of decades, the main 
explanation has shifted to climate — a profound 
drought and cold snap that hit in the 1270s. 

But a series of studies by Glowacki, Varian 
and other researchers reveals a much more 
complex answer. The scientists have used 
detailed archaeological analysis, fine-grained 
climatic reconstructions and computer mod- 
els to simulate how ancestral Pueblo families 
would have responded to their environment. 
The interdisciplinary strategy has enabled the 
researchers to examine prehistoric societal 
changes at a level unattainable in most other 
regions. “We have enormous detail on this 
archaeologically. Unparalleled detail,’ says 
Steve Lekson, an archaeologist at the Univer- 
sity of Colorado Boulder. 

The emerging picture is one of a society 
rocked by troubles until it eventually toppled. 
More than a century before the Mesa Verde 
villages emptied out, political disruptions 
and a monster drought destabilized the entire 
ancestral Pueblo world. Thousands of peo- 
ple moved into the Mesa Verde region from 
nearby areas, straining the agricultural capac- 
ity of the landscape and eroding established 
cultural traditions. This led to violent conflicts 
that further undermined the society, spurring 
some people to leave. When another drought 
hit in the late 1200s, the remaining population 
departed en masse. 

Political instability, cultural conflict, violence, 
overcrowding and drought. Many of the chal- 
lenges encountered by the ancestral Pueblo 
seem all too familiar in 2015, as hundreds of 
thousands of migrants flee from the Middle East 
and Africa towards Europe. When Glowacki 
looks at the events of more than seven centu- 
ries ago at Spruce Tree House, she sees many 
similarities. “There was a splintering that went 
on and an implosion of this political system. It 
was a rejection, them saying, “We can't live that 
way anymore. There has to bea better way”” 


STONE WORK 
It was chance that first carried Glowacki into 
the world of the ancestral Pueblo. Before 
starting graduate school, she ended up ina 
summer job as a ranger at Mesa Verde National 
Park, where she fell for the landscape and its 
archaeology. She has spent the past 23 years, 
on and off, researching the region’s ancient 
populations. 

At Spruce Tree House, Glowacki pulls out 
a map showing the latest results of an archi- 
tectural analysis that she is helping the park to 
carry out. The work is laborious — researchers 
sometimes sit in front of a wall of sandstone 
blocks for days, studying the mortar and rocks 
to work out how the structure was first built 
and then altered over time. 

Gradually, a history of the village has taken 
shape, showing that people assembled the 
first set of rooms in the alcove around the year 


TURBULENT TIMES 


their villages in the northern San Juan region of the US 
Southwest. Many of them resettled in the northern Rio 
Grande area of New Mexico.\ 
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In the late 1200s, ancestral Pueblo farmers abruptly left 
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COLORADO 


Before they migrated, the ancestral 
Pueblo built cliff dwellings that 
provided added security. Some of 
the most spectacular are found in 
Mesa Verde National Park. 


Chaco Canyon and the Aztec 
Ruins National Monument area 
were power centres of the Pueblo 
world during the 1000s and 
1100s. When their influence 
waned, it triggered political and 
cultural changes across the region. 
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1200, and added more right up until the last 
residents abandoned the site around 85 years 
later. The researchers can narrow construction 
dates to within a year or two by analysing tree- 
ring patterns in the wooden support beams 
in the ceilings and then matching them to an 
established tree-ring chronology for the region. 

Despite the tedious nature of the work, 
Glowacki says that it never loses its appeal. 
“There are rooms that are fully intact, and you 
can stand in them — and they were built in the 
1240s. In this country, being able to stand in 
something that was built at that time is really 
pretty magical” 

The cliff dwellings were a last resort for the 
park’s prehistoric Pueblo residents. When 
farmers first arrived in the region around 
AD 600, they settled on the fertile highlands 
above the canyons, which gave them easier 
access to their fields. But by 1200, some- 
thing began to force them over the edge into 
the giant alcoves that naturally form in the 
sandstone cliffs. 

Insights into that shift are emerging thanks 


Population data for the central Mesa Verde region show 
massive migration away from the area in the late 1200s. 


People flocked 
to the region in 
the early 1200s. Iam 


1000 1100 1200 1300 


Year 


to a major interdisciplinary effort called the 
Village Ecodynamics Project (VEP), which 
launched in 2002. Funded by the US National 
Science Foundation, the nearly US$2.5-million 
initiative is assessing how social and environ- 
mental factors influenced the populations 
of prehistoric Pueblo farmers from about 
600 to 1300, says Tim Kohler, the VEP’s prin- 
cipal investigator and an archaeologist at 
Washington State University in Pullman. 

In one strand of research, the team drew on 
the rich history of archaeology in the region to 
compile a database of 18,000 prehistoric sites, 
which allowed them to measure the popula- 
tion and how it shifted over time’. With such 
a massive database, the researchers could look 
at population changes in narrow time bands 
averaging about 40 years (see ‘All gone’). 

“There are not many places in the world 
where archaeologists can look at changes in 
such discrete slices of time; says Varien, who 
is a co-principal investigator of the VEP. The 
analysis’ suggested that people started leaving 
the Mesa Verde region at least 15 years before 
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the drought hit. “It looks as though the final 
depopulation began with a trickle and ended 
with a flood? says Scott Ortman, an archae- 
ologist at the University of Colorado Boulder 
who developed the model for the project’s 
population analysis. 

Another part of the VEP looked at how the 
farmers fed themselves. The researchers used 
temperature and precipitation estimates from 
tree-ring data to create a model of where the 
communities could have grown maize (corn) 
each year, which was their main source of food. 
The calculations of this ‘maize niche’ did a 
good job of explaining how many people set- 
tled in different regions, says Kohler. 

The teams latest data show that when grow- 
ing conditions improved, the population 
density spiked, more than doubling in some 
regions. But one place defied that pattern: 
Mesa Verde National Park. When farming 
became easier, people actually moved out of 
that area. And, paradoxically, when times grew 
tough, more people moved in. 

Kohler and his colleagues suggest that 
these movement patterns have to do with 
topography. The park stands higher than the 
surrounding landscape, so it gets more precipi- 
tation. And because the highlands tilt to the 
south, cold air drains off, leaving Mesa Verde 
warmer than the surrounding lowlands. So 
when the region faced drought or a cold spell, 
farmers congregated in the more-reliable 
Mesa Verde area — something researchers 
had not appreciated before now, says Kohler. 
“People have been working in this area for 
100 years, and I don't think they ever realized 
it? he says of such a climate pattern. 


VIRTUAL REALITY 

The VEP researchers have also conjured up 
a virtual version of the past. The team con- 
structed a computer model of the landscape 
and then seeded it with households that could 
grow maize, hunt, collect water and wood and 
move to new sites if they failed to secure enough 
resources. By comparing the simulations to 
the archaeological record, the researchers can 
examine factors that might have driven ancient 
populations to migrate. “It’s really a new way of 
doing archaeology,’ says Varien. 

Kohler says that he sometimes switches 
on the graphics during a simulation to watch 
the behaviour of the dots that represent 
households. Scattered randomly at first, they 
scurry around until their inhabitants can har- 
vest enough resources. Then, they form into 
settlements, which grow rapidly to a point 
when they can no longer sustain themselves — 
and so the households move again. But there is 
a limit to how much Kohler can watch. “Even 
on modern, fast processors, when the agents 
get into the thousands, it slows down and it’s 
no longer fun, he says. 

By comparing the simulations to the actual 
population data, the researchers discovered’ 
some interesting discrepancies during the 


1100s and 1200s. In the model, the farmers 
spread out farther across the landscape than 
they actually did in reality. So something seems 
to have caused the real ancestral Pueblo to 
huddle together more tightly than expected. 

Kohler and his colleagues wondered 
whether fear might have been a factor. To find 
out, they surveyed the archaeological litera- 
ture and tracked levels of violence in the area 
through time by tallying how many skeletons 
had broken arm bones, fractured skulls or 
other signs consistent with acts of aggression. 
Some had apparently died in massacres, and 
there was even evidence of cannibalism at 
certain sites. 

Between 600 and 1000, the Mesa Verde 
region was relatively peaceful, but rates of 
violence rose in the mid-1000s and spiked 
again in the late 1200s, right before the ancient 
Pueblo left, the researchers reported last year* 4 
“What we found was that people were more 
clumped up than the model predicted precisely 
in times when there was a lot of violence on the 
landscape,’ says Kohler. 

There has been some scepticism among 
archaeologists about the use of agent-based 
modelling, but Kohler says that it has been 
useful in this case: the inconsistency between 
the simulations and the real data prompted the 
researchers to look at violence in a new way. 
“That disjunction identifies for us interesting 
questions,” he says. 


processing, so the team will graduate to a 
supercomputer for future simulations, which 
are planned for later this year or early next year. 
Nothing of this scale has been done before in 
the field, says Kohler. “Archaeologists do not 
have the reputation of being users of high-per- 
formance computing environments,’ he says. 
“But I don’t think we'll be the end of the road 
for this kind of work” 

Among the ruins at Spruce Tree House, 
Glowacki takes a different approach. As a col- 
laborator on the VEP project, she does not 
discount the importance of drought and short 
growing seasons. But she focuses on some of 
the other factors that also stressed the ancestral 
Pueblo society. 

The signs are in the houses that fill the 
Spruce Canyon alcove. The architectural- 
documentation project has taught Glowacki 
that the residents there updated their homes 
just as much as people in New York or London 
today. “Even when they were living there, they 
were making changes and adding walls and 
doors and doing all of this remodelling” 


CULTURE CLASH 

Some of these alterations point to dramatic 
events. In the mid-1200s, structures associated 
with one of the founding families were burned: 
fire damage can be seen in one room and ina 
kiva, a circular depression that served as the 
family’s ceremonial space. The fire does not 


“IT GOT REALLY BAD AND REALLY NASTY, 
AND THEY WANTED T0 GET AWAY FROM IT.” 


Most researchers think that the majority of 
violent acts occurred within ancestral Pueblo 
communities: one village attacking another over 
food resources or neighbours turning on each 
other. More than half the skeletons from some 
periods bore signs of trauma. “They are one of 
the most violent societies we've ever studied,’ 
says Kohler. 

But not all of their troubles came from within. 
Some unusual-looking projectile points have 
turned up at massacre sites that date to just 
before the Pueblo people left the Mesa Verde 
region, so invading nomads might have hada 
role in forcing the farmers from their homes. 

In the next stage of the VEP project, research- 
ers plan to look at how food shortages might 
have contributed to violence. The new version 
of the agent-based model is more sophisticated 
than the last, allowing households to form 
social groups that compete with each other for 
access to agricultural lands. Leaders can emerge, 
fighting can erupt between groups and people 
can migrate away from Mesa Verde to an area 
farther south in New Mexico, where many 
ancestral Pueblo are thought to have resettled. 

This all amounts to a huge step up in 
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seem to be accidental, Glowacki says. Rather, 
it could have been part of a ritual changeover 
in ownership or it might reflect someone 
forcing out one of the original clans. “At the 
very least, that suggests there were some sig- 
nificant changes in the clans or families that 
were using the structures — or in part of the 
leadership there.” 

Other rooms in the alcove were also burned, 
including a tower that may have served as a 
defensive structure. Taken together, the archi- 
tectural evidence provides a detailed view 
of friction in the village, she says. “There 
was some sort of conflict and people left, 
presumably, and new people came in and 
remade these spaces.” 

Around the Pueblo region, there are many 
signs of cultural change leading up to and dur- 
ing the 1200s. Glowacki, along with some other 
archaeologists, thinks that such adjustments 
had to do with shifting political allegiances in 
that part of the world. 

During the mid-1000s and early 1100s, the 
centre of power among the Pueblo people was 
located about 150 kilometres south of the Mesa 
Verde area, in New Mexico's Chaco Canyon. 


ROBERT JENSEN/MESA VERDE NATL PARK 
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In the 1100s, an extension of the Chaco politi- 
cal order rose up at a site now called Aztec 
Ruins National Monument, midway to Mesa 
Verde. The Chaco-Aztec culture was socially 
stratified, with massive residences in which 
the elites lived. Smaller versions of the elite 
‘great houses’ have been found in villages to 
the north, which reveals the broad influence 
of the Chaco-Aztec political order. 

Then, an awful drought between 1130 and 
1150 apparently weakened that order, and 
new types of practice emerged. In the Mesa 
Verde region, some communities built more- 
inclusive spaces, such as open plazas, and they 
removed the roofs from some large kivas, 
allowing broader participation in rituals’. 

The changes in public and ceremonial 
spaces demonstrate the waning influence of 
the Chaco-Aztec polity, which had previously 
unified the Pueblo world. “What is happening 
is you have this dissolution and splintering,” 
Glowacki says. That may have contributed 
to the increased violence and served to drive 
farmers from their highland villages towards 
the more-secure alcoves along the cliff faces. 

These political upheavals may also partially 
explain why people started to abandon the 
Mesa Verde area decades before the drought 
of the mid-1270s hit. The combination of 
political instability, social upheaval and then a 
rotten climate was too much to take, she says. 
“Tt got really bad and really nasty, and they 


In Spruce Tree House, a ladder leads down into a sunken ceremonial space known as a kiva. 


wanted to get away from it.” 

Kohler sees parallels with the collapse of the 
classic Mayan civilization in the ninth century, 
as well as with events in the Middle East today. 
In the case of the Mesa Verde exodus, research- 
ers can look in detail not only at why and when 
people left, but also at what happened after- 
wards. “We need to understand migration 
streams better,” he says. “We have the advan- 
tage of the long view.” 


FINDING PEACE 
Whatever forced the Pueblo to uproot them- 
selves, tens of thousands of people left the 
Four Corners region in search of something 
better. And many apparently found what they 
were looking for. When the exodus began, the 
ancestral Pueblo migrated in several different 
directions: some to the southwest into Arizona 
and some to southern New Mexico. Archae- 
ologists have long suspected that many settled 
along the Rio Grande river in northern New 
Mexico, a couple of hundred kilometres south- 
east of the Mesa Verde region. That hypothesis 
is supported by population data, which show 
that the Rio Grande region became more 
crowded; VEP studies® have indicated that 
between 1250 and 1300, the population in this 
area swelled from 8,000 to 18,000 people. By 
the early decades of the 1300s, it was close to 
25,000, Ortman says. 

When they settled in their new home, 


FEATURE 


the Mesa Verde people made a clear break 
from their former lives. Analysis by Kohler, 
Ortman and their colleagues’ shows that rates 
of violence were much lower than before. And 
the Pueblo made social changes as well. “The 
migrants do not appear to be trying to con- 
tinue with the society and traditions of the 
Four Corners. They were trying to leave them 
behind, says Ortman. The Pueblo villages 
that grew up after 1300 reflect a much more 
communal type of society, in which multiple 
families shared kivas and residents gathered 
in open ceremonial spaces. 

There was also a political change, says 
Lekson, who has studied the elite residences 
at Chaco Canyon and Aztec Ruins. “They 
shucked off all the nobles and the kings, and 
they never had them again. They figured out 
how to run villages without that apparatus.” 

Even today, southwestern Pueblo villages 
continue to embrace an egalitarian society. Ort- 
man finds inspiration in the evolution of Pueblo 
culture after the collapse. “Pueblo people had to 
create those values and institutions that reflect 
them as a result of their past struggles,” he says. 

And that system has been remarkably 
successful. Pueblo villages have retained their 
culture and languages to a much stronger degree 
than most other Native American communi- 
ties, he says. “Some of the Pueblos that emerged 
after the Mesa Verde migration have been able 
to withstand 500 years of European coloniza- 
tion,” says Ortman. “One could say that those 
communities have weathered European colo- 
nization better than almost any other society in 
the world — certainly within the United States.” 

At Spruce Tree House, Glowacki has seen 
how strong those traditions still are. Just a few 
weeks earlier, she took part in a workshop that 
included some teachers who are Pueblo and 
who demonstrated how they grind maize. Even 
that mundane chore took on spiritual dimen- 
sions as the teachers made offerings to their 
ancestors who once inhabited the cliff dwell- 
ing. To the modern Pueblo, the centuries-old 
structures are not abandoned ruins but still 
echo with the spirits of those who came before. 

“Tt was a really beautiful moment,” says 
Glowacki. “What I think makes Pueblo cul- 
ture really interesting and perhaps unique is 
the long arc of Pueblo history. There's a lot we 
can learn about how a society faces really dif- 
ficult times, adversities — and fundamentally 
reorganizes and transforms their culture? m= 


Richard Monastersky is a features editor for 
Nature. 
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Make sense of health data 


Develop the science of data synthesis to join up the myriad varieties of health 
information, insist Julian H. Elliott, Jeremy Grimshaw and colleagues. 


to some chemical could increase your 
chances of getting colon cancer, you could 
easily find supportive evidence from animal 
experiments. You might then discover that 
epidemiological studies tell a different story. 
There have never been more options 
when it comes to measuring factors rel- 
evant to health. We can sequence our 
entire genomes and those of our bacteria, 
viruses and tumours. In principle, every 
visit to the doctor can be tracked from elec- 
tronic medical records. Information on 


I: you are wondering whether exposure 


physiology, behaviours, diets, movements 
and interactions with others can be extracted 
from wearable devices, smartphone apps and 
social-networking sites’. And thanks to the 
open-access movement and a shift in data- 
sharing norms, more data are being made 
publicly available. 

Yet sifting through the information to find 
answers to questions about health is becom- 
ing increasingly difficult, even for the experts. 
The data exist in disparate domains, are gen- 
erated using different methods, and are stored 
in different infrastructures — from the private 
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servers of hospitals to global platforms, such 
as dbGaP, an open database of genotypes and 
clinical information. 


POOLING DATA 
We believe that to consolidate data from 
different sources into comprehensive and 
coherent bodies of evidence on which deci- 
sion-makers can act, researchers need to bet- 
ter exploit current methods and tools for data 
synthesis — and to develop superior ones. 
Researchers usually try to obtain insights 
by pooling the same kind of data, suchas > 
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> from clinical trials. But because different 
study and data types tend to have distinct 
strengths and weaknesses, a much richer 
understanding can emerge when different 
kinds of information are combined. 

The drug cisapride, for instance, was 
licensed in the United States in 1993 to treat 
heartburn, on the basis of data collected in 
clinical trials over ten years. Yet the drug's 
association with fatal heart-rhythm distur- 
bances’ was understood only when data 
from clinical trials were consolidated with 
those from large, long-term cohort studies, 
which recorded cisapride’s effects in thou- 
sands of people. 

Likewise, the picture obtained from con- 
ventional influenza surveillance (which 
involves collecting data from primary-care 
clinics) can lag behind what is actually 
happening on the ground. Google collects 
real-time information based on the use of 
search terms related to flu symptoms, but 
these findings can be inaccurate. The best 
insights almost certainly come from aggre- 
gating these different data types’. 

So how can we bring together the multi- 
ple, extremely diverse data sets that are now 
becoming available? 

Formal methods for ‘evidence synthesis’ — 
in which multiple sources of data are com- 
bined to obtain new insights — were first 
developed in the social sciences in the 1970s. 
The techniques have since been adapted in 
many branches of science, and they underpin 
high-impact decision-making, for example 
in drug licensing’. They generally involve 
identifying and collating all the available and 
relevant data; assessing each data source’s 
strengths and vulnerability to bias; and decid- 
ing how to handle the different sources of 
data depending on their rigour and the ques- 
tion being asked (some data may be excluded, 
for instance). Then, if appropriate, a meta- 
analysis or qualitative assessment can be 
conducted, incorporating the information’. 

For example, a UK group combined® 
data from clinical trials with those 
from cohort studies in a meta- 
analysis to assess the effective- 
ness of anti-D, a drug given to 
some pregnant women to prevent 
them from producing antibodies 
against their babies. In this case, 
potential sources of bias, such as 
different clinics providing 
care for the women in cohort 
studies, were systematically 
identified, and their impact 
was minimized. 

Yet many researchers 
immersed in the combina- 
tion and analysis of 
large data sets that are 
vulnerable to spu- 
rious correlations, 
such as genomic or 


electronic-medical-record data, are unaware 
of evidence-synthesis tools and their poten- 
tial usefulness. Conversely, many experts in 
evidence synthesis are unfamiliar with the 
methods often used to analyse large data sets 
relevant to health. 

We believe that the core elements of evi- 
dence synthesis must be combined with 
other data sciences to develop new ways to 
make sense of diverse data. 


MANAGING BIAS 

Scientists need to work out why, when and 
how to combine diverse data — for instance, 
should physical-activity data from clinical 
records, online questionnaires and wearable 
devices be combined? As well as addressing 
when and how to combine diverse individ- 
ual-level data, scien- 


(17 e 
tists need to grasp the Society does 
risks of bias associ- not need more 
ated with each data islands of data 


type and incorporate analysis.” 
such risks into their 

analyses. For clinical trials and observa- 
tional studies of the effects of interventions, 
analysts can use the Cochrane Risk of Bias 
approach. Similar methods are needed to 
enable the detection and reduction of bias in 
other data types, such as social-networking 
and mobile-phone data. 

Also needed are agreed ways to capture 
and represent information on potential 
sources of bias. Organizations investing in 
infrastructure and standards for health data, 
such as Health-Level 7, need to incorporate 
this layer of metadata (data about data) into 
their systems. 

Methods to deal with bias must be incor- 
porated into new analytical systems devel- 
oped to guide decision-making in health 
care — including those based on natural- 
language processing and machine learning. 
Transparent and independent evaluations 
of these new systems will also be important, 
although challenging to achieve for 
proprietary systems such as IBM 

Watson. 

In the short to medium term, 
conferences, funding pro- 
grammes and a restructuring 

of departments in universities 
and institutes will be crucial to sup- 
port collaborations between com- 
putational biologists, computer 
scientists, clinical and population- 
health researchers and spe- 
cialists in evidence synthesis. 
For instance, major granting 
agencies should invest in dedicated 
research-methods programmes simi- 
lar to that of the UK National Institute 
for Health Research. Targeted investment 
will also be needed to develop data 
infrastructure in poor regions and 
countries. In the long term, a new 
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type of analyst, adept at appraising and com- 
bining diverse data types appropriately, may 
emerge. 


JOINING THE DOTS 

What could these shifts mean in practice? 
One of the aims of the US Precision Medi- 
cine Initiative (PMI) is to prevent people from 
getting cancer. This means understanding the 
effects of myriad genomic, behavioural and 
environmental factors and their interactions. 
The value of the initiative will be enhanced if 
data from these very different domains can be 
combined appropriately and easily. 

Another aim of the initiative is to develop 
new cancer therapies. Better systems for data 
synthesis would inform drug development 
with richer and more accurate insights from 
the ‘omics’ sciences, animal studies and early 
human trials. Moreover, health-care funders 
such as Britain’s National Health Service and 
Medicare in the United States could better 
understand a drug’s benefits and harms in 
the real world by synthesizing data from 
clinical trials, cohort studies, patient expe- 
riences reported through mobile and social 
applications, and drug-surveillance systems. 
(These include the US Sentinel Initiative and 
the Canadian Network for Observational 
Drug Effect Studies, which pool data from 
different health-care systems to monitor the 
adverse effects of licensed drugs.) 

Weare not proposing a one-model-fits-all 
approach. But society does not need more 
islands of data analysis that support con- 
flicting inferences. As large and diverse data 
sets become ever more plentiful, we must 
ensure that rigorous and trustworthy meth- 
ods to make sense of the data are developed 
in parallel. = 
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Many choices that people consider their own are already determined by algorithms. 
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Build digital democracy 


Open sharing of data that are collected with smart devices would empower citizens 
and create jobs, say Dirk Helbing and Evangelos Pournaras. 


phones and smart devices are all now 

equipped with communicating sensors. 
In ten years, 150 billion ‘things’ will connect 
with each other and with billions of people. 
The ‘Internet of Things’ will generate data vol- 
umes that double every 12 hours rather than 
every 12 months, as is the case now. 

Blinded by information, we need ‘digital 
sunglasses. Whoever builds the filters to 
monetize this information determines what 
we see — Google and Facebook, for exam- 
ple. Many choices that people consider their 
own are already determined by algorithms. 
Such remote control weakens responsible, 
self-determined decision-making and thus 
society too. 

The European Court of Justice’s ruling 
on 6 October that countries and companies 
must comply with European data-protec- 
tion laws when transferring data outside the 
European Union demonstrates that a new 
digital paradigm is overdue. To ensure that 
no government, company or person with 
sole control of digital filters can manipulate 


Pes coffee machines, toothbrushes, 


our decisions, we need information sys- 
tems that are transparent, trustworthy and 
user-controlled. Each of us must be able to 
choose, modify and build our own tools for 
winnowing information. 

With this in mind, our research team at 
the Swiss Federal Institute of Technology in 
Zurich (ETH Zurich), alongside international 
partners, has started to create a distributed, 
privacy-preserving ‘digital nervous systen’ 
called Nervousnet. Nervousnet uses the sen- 
sor networks that make up the Internet of 
Things, including those in smartphones, to 
measure the world around us and to builda 
collective ‘data commons. The many chal- 
lenges ahead will be best solved using an 
open, participatory platform, an approach 
that has proved successful for projects such 
as Wikipedia and the open-source operating 
system Linux. 


AWISE KING? 

The science of human decision-making is 
far from understood. Yet our habits, rou- 
tines and social interactions are surprisingly 


predictable. Our behaviour is increasingly 
steered by personalized advertisements and 
search results, recommendation systems 
and emotion-tracking technologies. Thou- 
sands of pieces of metadata have been col- 
lected about every one of us (see go.nature. 
com/stoqsu). Companies and governments 
can increasingly manipulate our decisions, 
behaviour and feelings’. 

Many policymakers believe that personal 
data may be used to ‘nudge’ people to make 
healthier and environmentally friendly 
decisions. Yet the same technology may 
also promote nationalism, fuel hate against 
minorities or skew election outcomes’ if eth- 
ical scrutiny, transparency and democratic 
control are lacking — as they are in most 
private companies and institutions that use 
‘big data. The combination of nudging with 
big data about everyone's behaviour, feelings 
and interests (‘big nudging; if you will) could 
eventually create close to totalitarian power. 

Countries have long experimented with 
using data to run their societies. In the 1970s, 
Chilean President Salvador Allende created 
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computer networks to optimize industrial 
productivity’. Today, Singapore considers 
itself a data-driven ‘social laboratory”* and 
other countries seem keen to copy this model. 

The Chinese government has begun rating 
the behaviour of its citizens’. Loans, jobs and 
travel visas will depend on an individual’s 
‘citizen score; their web history and political 
opinion. Meanwhile, Baidu — the Chinese 
equivalent of Google — is joining forces with 
the military for the ‘China brain project; 
using ‘deep learning artificial-intelligence 
algorithms to predict the behaviour of people 
on the basis of their Internet activity’. 

The intentions may be good: it is hoped 
that big data can improve governance by 
overcoming irrationality and partisan inter- 
ests. But the situation also evokes the warn- 
ing of the eighteenth-century philosopher 
Immanuel Kant, that the “sovereign act- 
ing... to make the people happy according to 
his notions... becomes a despot” It is for this 
reason that the US Declaration of Independ- 
ence emphasizes the pursuit of happiness of 
individuals. 

Ruling like a “benevolent dictator’ or ‘wise 
king’ cannot work because there is no way 
to determine a single metric or goal that a 
leader should maximize. Should it be gross 
domestic product per capita or sustainability, 
power or peace, average life span or happi- 
ness, or something else? 

Better is pluralism. It hedges risks, pro- 
motes innovation, collective intelligence and 
well-being. Approaching complex problems 
from varied perspectives also helps people to 
cope with rare and extreme events that are 
costly for society — such as natural disasters, 
blackouts or financial meltdowns. 

Centralized, top-down control of data has 
various flaws. First, it will inevitably become 
corrupted or hacked by extremists or crimi- 
nals. Second, owing to limitations in data- 
transmission rates and processing power, 
top-down solutions often fail to address local 
needs. Third, manipulating the search for 
information and intervening in individual 
choices undermines ‘collective intelligence”. 
Fourth, personalized information creates 
‘filter bubbles’®. People are exposed less to 
other opinions, which can increase polariza- 
tion and conflict’. 

Fifth, reducing pluralism is as bad as 
losing biodiversity, because our economies 
and societies are like ecosystems with mil- 
lions of interdependencies. Historically, 
a reduction in diversity has often led to 
political instability, collapse or war. Finally, 
by altering the cultural cues that guide peo- 
ples’ decisions, everyday decision-making 
is disrupted, which undermines rather than 
bolsters social stability and order. 

Big data should be used to solve the 
world’s problems, not for illegitimate manip- 
ulation. But the assumption that ‘more data 
equals more knowledge, power and success’ 


does not hold. Although we have never had 
so much information, we face ever more 
global threats, including climate change, 
unstable peace and socio-economic fragility, 
and political satisfaction is low worldwide. 
About 50% of today’s jobs will be lost in the 
next two decades as computers and robots 
take over tasks. But will we see the macro- 
economic benefits that would justify such 
large-scale ‘creative destruction’? And how 
can we reinvent half of our economy? 

The digital revolution will mainly benefit 
countries that achieve a ‘win-win-win situ- 
ation for business, politics and citizens alike’”. 
To mobilize the ideas, skills and resources 
of all, we must build information systems 
capable of bringing 


diverse knowledge “Big data 
and ideas together. should be 
Online deliberation Used tosolve 
platforms and recon- the world’s 


figurable networks of problems.” 
smart human minds 

and artificially intelligent systems can now 
be used to produce collective intelligence 
that can cope with the diverse and complex 
challenges surrounding us. 


A DIGITAL NERVOUS SYSTEM 
The Nervousnet project is working on this. 
It began as a tool for scientists to experiment 
with the Internet of Things. For example, 
social interactions can be studied by anony- 
mously tracing the physical proximity of peo- 
ple (given their informed consent). 
Nervousnet now enables anyone to meas- 
ure and analyse aspects of the world in real 
time. The Nervousnet app allows users to 
activate or deactivate about ten smartphone 
sensors that measure, for example, accelera- 
tion, light and noise. A range of other func- 
tions are being shaped by the core research 
and development team at ETH Zurich and 
about a dozen research groups in Europe, 
Japan and the United States. The project is 
funded by the European Commission, Delft 
University of Technology in the Netherlands 
and philanthropists. It is also supported by 
volunteer developers. We aim for global col- 
laboration and benefits, even if there will be 
different variants in the end (as happened for 
Unix operating systems, for example). 
Unlike initiatives for the Internet of 
Things spearheaded by big technology 
companies, Nervousnet is run as a ‘citizen 
web, built and managed by its users. Inspired 
by Wikipedia and OpenStreetMap, people 
can interact with Nervousnet in three ways. 
They can contribute data, analyse the crowd- 
sourced data sets, and share code and ideas. 
Anyone can create data-driven services 
and products using a generic program- 
ming interface. The aim is to yield societal 
benefits, business opportunities and jobs. 
Several Internet of Things platforms and 
data-science projects share Nervousnet’s 
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vision; none has its scope. They focus on 
participatory data collection; decentral- 
ized communication services; or big-data 
analytics. Nervousnet is designed to meet 
all three objectives. It will also enable real- 
time measurement and feedback to support 
self-organizing systems. For example, self- 
controlled traffic lights responding to local 
vehicle flows can reduce urban congestion 
and outperform today’s centralized systems. 

Nervousnet uses distributed data storage 
and distributed control, so that it is resilient 
to attacks and centralized manipulation 
attempts, easy to scale up, and tolerant to 
faults. Because data encryption is not enough, 
a secure personal-data store will be needed 
to allow each user to determine which data 
to share with whom, and for what purpose. 

Attracting users is a challenge. We will 
be adding elements of gaming to make 
participation more enjoyable, as well as a 
micro-payment system to reward and incen- 
tivize digital co-creation. Because critics may 
worry about the responsible use of bottom- 
up systems, Nervousnet will integrate repu- 
tation systems, qualification mechanisms 
and self-governance by community mod- 
erators. 

In the long run, measurements tailored 
to specific purposes and a combination of 
crowdsourced data generation, curation and 
analysis will outperform the currently fash- 
ionable big-data analytics approach. Just as 
the open standards of the World Wide Web 
created unprecedented opportunities and a 
multibillion-dollar economy, the right frame- 
work for the Internet of Things and digital 
society could foster an age of prosperity. m 
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Detainees are held at the United States’ Guantanamo Bay detention camp in Cuba in 2002. 


Tortured reasoning 


Lasana T. Harris commends a book exposing the lack of 
scientific basis to “enhanced interrogation techniques’. 


at its Guantanamo Bay detention camp, 

the US government made a significant 
decision. It moved the responsibility for 
‘enhanced interrogation techniques’ from the 
CIA to a new government organization: the 
High- Value Detainee Interrogation Group 
(HIG). The move upset many CIA insiders; 
torture had been in their toolkit since the 
early days of the cold war. The remarks of one 
official at a HIG-organized conference on 
torture in Washington 
DC can be summed up 
as: how could a new 
agency, created to both 
conduct and study tor- 
ture, replace the dec- 
ades of practice and 


F 2009, following the abuse of prisoners 


perfection attained by 
the CIA? By adding a 
scientific component, nen 
ded the newl y Torture 

pena lear waite! Doesn’t Work: The 
appointed head of the Naiaxtionte of 
HIG. ‘ Interrogation 

This exchange high- — sHane o'marA 
lights the theme of — Harvard University 
neuroscientist Shane Press: 2015. 


O’Mara’s Why Torture Doesn't Work. Rightly, 
O’Mara takes a moral stand against torture 
(forced retrieval of information from the 
memories of the unwilling). However, instead 
of simply providing utilitarian arguments, he 
argues that there is no evidence from psychol- 
ogy or neuroscience for many of the specious 
justifications of torture as an information- 
gathering tool. Providing an abundance of 
gruesome detail, O'Mara marshals vast, useful 
information about the effects of such prac- 
tices on the brain and the body. 

For instance, he explains why, physiologi- 
cally, it is ludicrous to claim that stress, pain 
and fear will coerce a suspect to surrender 
critical information. The prolonged release of 
stress hormones such as cortisol damages the 
hippocampus — a brain structure crucial for 
encoding and retrieving memories — as well 
as the prefrontal cortex, which is implicated 
in decision-making and executive control 
processes. Such damage works in opposition 
to the goal of torture. Furthermore, chronic 
stress creates a negative feedback loop, caus- 
ing enlargement and hyperresponsiveness of 
the amygdala, the brain structure that under- 
lies emotional salience, directs attention, 
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enables learning and communicates with 
most of the brain. 

Another striking example that O’Mara 
discusses is the effect on the brain of sleep 
deprivation. The practice was described in 
the “Torture Memos’ — legal memoranda 
drafted in 2002 by US deputy assistant 
attorney general John Yoo, advising the CIA 
and President George W. Bush on the use of 
torture. Officially limited to a maximum of 
180 hours, and often combined with physical 
restraint, isolation, starvation and beatings, 
sleep deprivation has been used to coerce 
subjects into revealing information. 

The memos further argue that sleep 
deprivation is harmless. O’Mara, however, 
discusses research suggesting that it erodes 
memory processes and general cognitive 
function by flooding the brain with gluco- 
corticoid hormones. Even military scientists 
have produced literature that admits psycho- 
physiological issues with sleep deprivation. 
In 1990, Paul Naitoh and his colleagues at 
the US Naval Health Research Center in San 
Diego, California, published evidence that 
the practice leads to an increase in circulat- 
ing stress hormones and the development of 
psychomotor epileptic discharges (P. Naitoh 
et al. Occup. Med. 5, 209-237; 1990). They 
argued, too, that if combined with other 
stressors, such as food and water deprivation 
and waterboarding, sleep deprivation could 
negatively affect respiratory—cardiovascular 
function. 

Yet some officials and politicians continue 
to make announcements that run counter 
to such scientific evidence. Former Penn- 
sylvania senator and Republican presiden- 
tial hopeful Rick Santorum, for instance, 
commented in a 2011 interview that after 
being broken, people become cooperative. 
Most shocking may be this year’s revelation 
that a handful of officials in the American 
Psychological Association were complicit in 
torture by the United States after the Sep- 
tember 2001 attacks on New York and the 
Pentagon, thus providing a veil of scientific 
legitimacy to the practice. 

Torture also affects the torturer. The 
cognitive dissonance required to inflict suf- 
fering results in symptoms similar to those of 
post-traumatic stress disorder, O’Mara warns. 
He cites Joshua Phillips’s None of Us Were Like 
This Before (Verso, 2010), which describes 
how many US veterans who had engaged in 
torture in Iraq experienced intense guilt or 
turned to substance abuse once back in the 
United States. Interviews with former interro- 
gators in Northern Ireland, published by Ian 
Cobain in Cruel Britannia (Portobello, 2012), 
reveal that many believed what they had done 
was wrong, but saw it as a desperate attempt to 
end the violence engulfing their society. 

Given that information obtained under 
torture is rarely reliable (because the vic- 
tim will generally say anything to make > 
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> the pain stop) O'Mara recommends 
an alternative: conversation. Having a 
conversation with a detainee may yield 
results comparable, and probably supe- 
rior, to those obtained from torture. He 
cites three pieces of evidence. 

First is a 1993 study by Stephen Moston 
and Terry Engelberg of police interroga- 
tions, which found that of more than 1,000 
detainees, only 5% refused to talk (S. Mos- 
ton and T. Engelberg Polic. Soc. 3, 223- 
237; 1993). Second, research by Robin 
Dunbar and his colleagues finds that 40% 
of what we reveal in conversation is related 
to the self, suggest- 


ing that refusing “Conversation 
to self-disclose witha 
is very difficult detainee may 
(R. I. M. Dunbar yield results 
et al. Hum. Nat. 8, 
comparable, 
231-246; 1997). and probabl 
Third, a study by Pr “d 
superior, to 
Diana Tamir and ° 
those obtained 


Jason Mitchell 
showed that people 
are willing to forgo 
money to talk to others about themselves. 
Indeed, the nucleus accumbens (part 
of the brain’s reward circuitry) activates 
during such an opportunity, suggesting 
that people find disclosure intrinsically 
rewarding (D. I. Tamir and J. P. Mitchell 
Proc. Natl Acad. Sci. USA 109, 8038-8043; 
2012). O’Mara does acknowledge that the 
difficulties of having such a conversation 
with a non-compliant person demand 
advanced social skills that are compara- 
ble to those of clinical psychologists and 
psychiatrists, who often deal with non- 
compliant patients. He suggests that alter- 
native approaches such as virtual reality 
and role playing may be useful for infor- 
mation gathering during interrogation. 

Why then, given its uselessness in 
eliciting valuable information, do peo- 
ple torture? It is a form of vengeance or 
punishment, intended to discourage the 
victim from future transgressions and to 
communicate to others that harm will 
not be tolerated. In some cases, it occurs 
because the torturer believes that ter- 
rorists have mental illnesses. In science, 
however, punishment is not a viable 
response to someone with such an illness 
— just as torture is not a viable method 
for gathering information, as O’Mara 
repeatedly points out. m 


from torture.” 


Lasana T. Harris is a senior lecturer in 
experimental psychology at University 
College London, and a guest lecturer in 
social and organizational psychology at 
Leiden University in the Netherlands. 
He studies the neuroscience of 
dehumanization and prejudice. 

e-mail: lasana.harris@ucl.ac.uk 
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Steve Jobs (Michael Fassbender) confronts his daughter Lisa (Perla Haney-Jardine) in Steve Jobs. 


A binary life 


A polished biopic of tech titan Steve Jobs fails to plumb 
fully his inner contradictions, finds Timo Hannay. 


r The closest I came to meeting Steve 
Jobs was in the late 2000s, shortly 
after the birth of the iPhone. I was 

attending Foo Camp, a California mustering 
of digital demigods. Jeff Bezos of Amazon 
was a regular; the year before, Google co- 
founder Larry Page had turned up in his 
helicopter. Everyone but me took such 
things in their stride. That year, however, 
there was something different in the air: a 
rumour had spread that Steve Jobs himself 
might join us. He never showed up, but 
such was his unique status that even his 
absence generated more excitement than 
the presence of other tech giants. 

Blessed as he was with formidable taste 
and rock-star showmanship, Jobs was always 
going to stand out from the crowd of awk- 
ward nerds (like me) who populate much 
of the technology landscape. Add to this 
his death at the height of his powers, and 
we have all the ingredients of a legend. This 
is not undeserved. Many technologists talk 
of changing the world; Jobs actually did so. 
More than anyone else, he broke down the 
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Steve Jobs barriers between tech- 

WRITTEN BY AARON nology and humanity, 
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products. Then, with 
the iPhone, he pulled 
off the reverse, turn- 
ing an established consumer product into a 
computer. 

How best to understand such a life? Jobs’s 
answer was to invite high-flying writer and 
former media executive Walter Isaacson 
to pen his biography — a superb account 
published within days of Jobs’s death. Steve 
Jobs (Simon and Schuster, 2011) is likely 
to remain the closest we will ever get to a 
definitive account. 

The film version of Isaacson’s block- 
buster is a highly competent creation — 
as you would expect from writer Aaron 
Sorkin (The Social Network, The West 
Wing) and director Danny Boyle (Slumdog 
Millionaire, Trainspotting). The dialogue 
zips along at 100 beats per minute; the act- 
ing (especially by Michael Fassbender in 


Universal: 2015. 
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the title role) is at times outstanding; and 
the direction is as slick as that of any other 
Hollywood offering. Yet many people 
will watch this film to better understand 
its subject — and by that measure, it falls 
short. 

The plot hinges on Jobs’s relationship 
with his daughter, Lisa Brennan-Jobs, and 
plays on the contrast between his lavishing 
of obsessive attention on his latest electronic 
brainchild and his ignoring, or disown- 
ing, of his flesh and blood. It does this by 
going backstage at three seminal product 
launches: those of the Macintosh in 1984, 
the NeXT Computer in 1988 and the iMac 
in 1998. This convenient three-act structure, 
which catches Jobs at three key moments in 
his life, also serves as a metaphor for the 
contrast between his suave public persona 
and his chaotic life. 

This leaves a lot out. And therein lies 
the main weakness of this film: there are 
umpteen other contradictions to explore 
in Jobs. He was simultaneously a hippy 
and a control freak. He was an ascetic 
drawn to mysticism who built the world’s 
preeminent consumer-products company. 
He was egocentric and impossible, inspir- 
ing both incredible feats of engineering 
(starting with the design of the Apple II by 
co-founder Steve Wozniak) and deep affec- 
tion (despite frequently taking credit for the 
work of Wozniak and others). 

Of course, covering all this ground in a 
two-hour film would be difficult. But the 
setting means that Jobs's close colleagues, 
relatives and key antagonists must all be at 
the launches with him, wanting to discuss 
their gripes in the same few minutes before 
he is due to step on stage. (In one amusing 
‘meta moment, Fassbender actually notes 
precisely this.) This frequently stretches 
credibility too far. 

The first two-thirds of the film thus strug- 
gle to engage — and will probably confuse 
people unfamiliar with the story and the cast 
of characters. It includes plenty of wonder- 
fully quotable lines and aphorisms from the 
book, such as Jobs's burning desire to “put 
a dent in the universe”. But the rat-a-tat-tat 
form feels more like a collage than a coherent 
narrative. In short, it could have done with a 
dose of Jobsian minimalism. That said, the 
film redeems itself in the third act — rather 
like Jobs's career. 

If you want an impressionistic, almost 
dreamlike montage of key moments in Jobs's 
life, see Steve Jobs. If you want to understand 
Jobs the man, you will be disappointed. But 
see the film anyway: it makes a great trailer 
for the book. m 


Timo Hannay is the founder of SchoolDash, 
an education data analytics firm based in 
London. 

e-mail: timo@hannay.net 


Books in brief 


Will Africa Feed China? 

Deborah Brautigam OXFORD UNIVERSITY PRESS (2015) 

Starting in 2008, China — with more than 20% of the global 
population and just 9% of the arable land — was said to be buying 
up swathes of African farmland. In her cogent analysis, international- 
development specialist Deborah Brautigam cuts her own swathe 
through myths about this relationship. She marshals fresh case 
studies to reveal that Chinese companies own just 250,000 hectares of 
African land, while the country has no government policy on overseas 
farming. Far from being the first ripple of an imperial storm, she 
argues, Chinese interests in Africa largely follow in Western footsteps. 


The Heart Goes Last: A Novel 

Margaret Atwood BLOOMSBURY (2015) 

Stan and Charmaine struggle to survive in a squalid, lawless near 
future. The Positron Project, a social experiment in which they spend 
alternating stints in prison and suburbia, seems to offer a way out — 
at first. Doyenne of speculative fiction Margaret Atwood is on grimly 
hilarious form here as tour guide to a macabre society given over to 
unregulated science, social cleansing, identity loss and profiteering. 
She prods satirically at issues from industrial farming (heacless- 
chicken production aimed at “meat growth efficiencies”) to sexbots, 
and even fits in a subplot featuring a horde of Elvis impersonators. 


Failure: Why Science Is So Successful 

Stuart Firestein OXFORD UNIVERSITY PRESS (2015) 

Biologist Stuart Firestein’s energetic sequel to /gnorance (Oxford 
University Press, 2012) explores the centrality of failure in the 
scientific endeavour. Naturalist Ernst Haeckel’s erroneous ideas 
about ontogeny and phylogeny, for instance, helped to spawn the 
field of embryology. Firestein ranges widely, looking at failure in 
contexts ranging from pharma to funding. At base, however, this 
is a close examination of how repeated failure refines problems, 
clarifying the way forward — a challenge that in turn sparks the 
courage and clarity of mind needed for incisive investigation. 


Natural Histories: 25 Extraordinary Species That Have Changed 
our World 

Brett Westwood and Stephen Moss JOHN MurRAY (2015) 

Based on an eponymous BBC Radio 4 series, this collaboration with 
London’s Natural History Museum explores the biology and cultural 
histories of selected flora and fauna. Naturalist Brett Westwood and 
writer Stephen Moss present an idiosyncratic list, including mandrill, 
oak, coral, cockroach and whale. Out of myriad gripping stories, their 
take on the lion resonates: the imposing beast may be a cultural 
ubiquity, yet African populations have diminished catastrophically 
from 400,000 in 1950 to fewer than 30,000 today. 


Breaking the Chains of Gravity: The Story of Spaceflight before 
NASA 

Amy Shira Teitel BLOOMSBURY SIGMA (2015) 

In this straightforward chronicle, science journalist Amy Shira Teitel 
traces NASA's ‘prequel’. However familiar, the early discoveries of 
rocketeers such as Romanian physicist Hermann Oberth still thrill, 
as does (in a very different way) the crucial input of former Nazi and 
rocket designer Wernher von Braun. Teitel delivers on detail, such 
as the exploits of supersonic-flight pioneer Chuck Yeager; but the 
whole needs more synthesis and never quite soars. Barbara Kiser 
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Volkswagen and 
the road to Paris 


In the wake of the Volkswagen 
emissions-testing scandal (see 
Nature http://doi.org/723; 2015), 
this month's climate summit 

in Paris needs to roll out an 
international framework for 
regulating emissions — with 
strong incentives and tough 
penalties. Voluntary national 
measures are no longer enough. 

Volkswagen's gaming of 
emissions testing underscores the 
urgency of reinventing transport. 
Governments must implement 
electric transport systems 
and plan for combustion-free 
inner cities, by expanding such 
schemes as London’s ultralow- 
emission zone, due in 2020. 
Chancellor Angela Merkel could 
fast-track zero-emission zones 
for Berlin, Hamburg, Munich 
and Frankfurt by 2025, for 
example — restoring Germany's 
environmental lead. 

The Volkswagen debacle 
should be treated as an Enron 
moment for sustainability 
measurement and valuation, 
with a comparable overhaul of 
the requirements for corporate 
accounting and evaluation. 
Programmes such as the 
Redefining Value initiative of 
the World Business Council 
for Sustainable Development 
can capture environmental 
externalities, including impacts 
on climate, biodiversity and 
health. Now we just need 
governments to incorporate 
these into their regulatory and 
stock-market requirements. 
Gail Whiteman, Harry Hoster 
Lancaster University, UK. 
g.whiteman@lancaster.ac.uk 


DEFRA responds to 
badger -cull critique 


In calculating the effectiveness 
of the latest UK badger-culling 
targets for controlling bovine 
tuberculosis, Christ! Donnelly 
and Rosie Woodroffe do not 
consider the uncertainties in 
estimating badger populations 


or how information collected 
during culling is used to evaluate 
the success of culls in real time 
(Nature 526, 640; 2015). 

Experience shows that there 
is greater uncertainty associated 
with badger population estimates 
than previously thought. A 
post-cull assessment by the 
Department for Environment, 
Food and Rural Affairs (DEFRA), 
using the number of badgers 
removed and reductions in sett 
occupancy, suggests that badger 
abundance may have been 
overestimated. Using the mean 
of the population estimate to 
establish a minimum number to 
be culled, as implied in Donnelly 
and Woodroffe’s calculations, 
leads to a high probability ofa 
culling objective that could greatly 
exceed actual badger numbers. 

The current culls use methods 
similar to a trial that ran from 
1998 to 2006 in southwest 
England and the west Midlands. 
The trial achieved a roughly 70% 
reduction in badgers, with large 
variance between trial zones, 
based on post-hoc assessments. 
Applying similar culling effort 
to a zone should converge ona 
similar outcome to the trial. 

To reduce the badger 
population by a similar 
proportion as in the trial 
while providing an achievable 
objective, the government has 
set an initial minimum culling 
number at the lower end of the 
estimated population range. 
Information gathered during the 
cull will be used to assess whether 
this number should be increased. 
The most up-to-date data about 
the badger population are used 
to assess whether the culls are 
removing enough badgers and 
are therefore likely to achieve a 
similar outcome to the trial. 

Comparison with control 
zones, where there has been no 
culling, provides no indication 
that culling has increased disease 
in cattle, as was widely predicted 
in advance of the culls (see 
go.nature.com/grk4ri). 

Ian L. Boyd University of 
St Andrews, UK; and DEFRA, UK. 
ilb@st-andrews.ac.uk 
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China emissions: stop 
subsidizing emitters 


China needs to resolve its 
conflicting policies on reducing 
carbon emissions and on 
increasing economic growth if it 
is to implement a cap-and-trade 
system successfully (see Nature 
526, 13-14; 2015). 

For example, the government 
subsidizes several industries that 
are big energy consumers and 
generate excessive emissions and 
pollution. China's coal-driven iron 
and steel industry is one such 
case, despite its overcapacity, low 
profit and vicious competition. 

By June 2015, six months after 
China's revised Environmental 
Protection Law came into effect, 
2,556 listed companies were in 
receipt of government subsidies 
that totalled 250 times more 
than the fines for environmental 
damage (see go.nature.com/ 
ozprpce (in Chinese) and D. Liu 
Nature 525, 321; 2015). 

These absurd subsidies hamper 
the transformation of industry to 
cleaner production and distort 
resource allocation through local 
protectionism and lobbying. 
They should be backed by firmer 
legislation or abolished. 

Xin Miao Harbin Institute of 
Technology, Harbin, China. 
xin.miao@aliyun.com 


China emissions: 
alter energy markets 


China has issued a nationwide 
cap-and-trade programme and 
a series of laws to cut its carbon 
emissions by 40-45% between 
2005 and 2020 (see Nature 526, 
13-14; 2015 and G. Wagner 
et al. Nature 525, 27-29; 2015). 
The realities of running such 
complicated systems and pricing 
schemes are daunting, however. 
Obstacles include promotion 
of local government officials, 
which depends not on how well 
they protect the environment 
but on how they help to develop 
the economy. And more 
commercial incentives are 
needed for China to implement 


ways to reduce emissions. 
Although the government 
has vowed to make the energy 
sector more accountable in 
market terms, administrative 
interventions continue to be 
the norm. The energy market is 
dominated by monopolies, and 
prices are tightly controlled by the 
administration. These problems 
must be addressed if China is to 
use its resources efficiently. 
Dayuan Li, Shenggang Ren 
Business School of Central South 
University, Hunan, China. 
Xiaohong Chen Hunan 
University of Commerce, China. 
bigolee@163.com 


Europe’s first ‘3Rs’ 
governmental centre 


In September, the German 
government opened a nationwide 
centre at the Federal Institute for 
Risk Assessment that is legally 
committed to protecting animals 
used for scientific purposes. The 
initiative is the first of its kind 

in Europe and is scientifically 
independent of executive and 
political advisory bodies. It 

aims to encourage greater 
transparency and raise standards 
of animal welfare by adopting an 
interdisciplinary approach. 

Known as Bf3R (www.bf3r.de), 
it will encourage European 
research to meet the ‘3Rs’ targets 
for animal experimentation 
(for replacement, reduction and 
refinement; see go.nature.com/ 
yidbm2). It will lead the way in 
enforcing the country’s Animal 
Welfare Act and European 
Directive 2010/63/EU on the 
protection of lab animals. 

Bf3R will also advise on legal 
and other requirements, helping 
authorities and researchers across 
Europe to communicate proper 
animal-protection practice to 
other scientists and to the public. 
Gilbert Schénfelder Federal 
Institute for Risk Assessment 
(BfR); and Charité — University 
Medicine Berlin, Germany. 
Barbara Grune, Andreas 
Hensel BfR, Germany. 
gilbert.schoenfelder@bfr.bund.de 
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Australia at the crossroads 


A modelling study argues that comprehensive policy change could limit Australia’s environmental pollution while 
maintaining a materials-intensive path to economic growth. But other paths are worth considering. SEE ARTICLE P.49 


BENJAMIN L. BODIRSKY & ALEXANDER POPP 


espite Australia’s vastness and its 
D swathes of untouched nature, its 
per-capita environmental footprint is 

one of the biggest worldwide. Because it is a 
major exporter of agricultural products, coal 
and other emissions-intensive commodities, 
there is great concern that binding climate 
agreements could harm the country’s econ- 
omy. In 2014, under then prime minister Tony 
Abbott, the current conservative government 
replaced a carbon-tax policy with inefficient 
mitigation subsidies’. Abbott was toppled from 
the party leadership in September 2015. His 
successor, Malcolm Turnbull, was once a strong 
proponent of a carbon-trading scheme, but it 
remains uncertain whether environmental 
policies will be reformed under his leadership. 
On page 49 of this issue, Hatfield-Dodds 
et al.’ argue that Australia can stick to its 
materials-intensive industries and enjoy con- 
tinued high economic growth while reducing 
its impacts on climate, water and biodiver- 
sity. The authors show that greenhouse-gas 


a_ One scenario from Hatfield-Dodds et al. 


Figure 1 | Possible paths. a, Scenario modelling presented by 
Hatfield-Dodds et al.” suggests that Australia could maintain its economic 
growth and its typically materials-intensive lifestyles, while reducing its 
environmental impacts. Under this scenario, fossil fuels continue to be 
burned, but in combination with carbon capture and storage. The 
transport sector switches to electric and hybrid cars. Agriculture is 
intensified and dominated by forest plantations to sequester carbon, 
while biodiversity reserves and seawater desalination produce 


emissions can be mitigated through efficiency 
improvements in production processes, and 
even more through carbon removal by plant- 
ing forests (afforestation) and carbon cap- 
ture and storage. The premise in any case is a 
comprehensive pricing of emissions. 

Hatfield-Dodds and colleagues’ assess- 
ment, the most comprehensive conducted for 
Australia so far, is based on the Australian 
National Outlook 2015, a report’ prepared 
by the country’s Commonwealth Scientific 
and Industrial Research Organisation. The 
authors used nine linked simulation models 
to estimate the performance of Australia’s 
economy in a global market, with a particular 
focus on the agriculture, energy and trans- 
port sectors, which exert the largest environ- 
mental pressures on land, water and climate. 
The modelling framework is exemplary in 
bridging scales between global, national and 
sub-national dynamics. This cross-scale 
approach could, and should, become seminal 
for future regional assessments. 

The study produces 20 scenarios for 
Australia’s future, exploring possible domestic 


b Alternative scenario 


institutional failure. 


40 | NATURE | VOL 527 | 5 NOVEMBER 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


developments in regard to lifestyle, policy 
and technological progress. All scenarios are 
embedded in one of four possible settings 
for global change, characterized by different 
population trajectories and by different global 
carbon prices, leading to 2, 3 or 6°C of global 
warming above pre-industrial levels in the year 
2100. The authors’ models then provide pro- 
jections, under each scenario, for rates of tech- 
nology adoption in the energy, transport and 
agricultural sectors; for production, income, 
and trade; and for environmental indicators 
such as water usage, land clearing and green- 
house-gas emissions. 

The findings indicate that Australia’s gross 
domestic product will more than double by 
2050 in all scenarios. However, without car- 
bon pricing, greenhouse-gas emissions would 
increase by up to 90% in the same period. Even 
with a carbon tax at a similar level to that in 
force in 2012-14, Australia’s emissions are 
projected to rise by about 25% by 2050. Com- 
plying with a 2°C global-warming target will 
require higher taxes, which Hatfield-Dodds 
et al. show can be reached most cost-effectively 


ecosystem services. b, An alternative pathway, not simulated by the 

authors, is a structural change towards a labour- and technology-intensive 
economy, with dematerialized lifestyles. Energy is obtained from 

renewable sources and public transport is expanded. Agriculture gradually 
shifts from resource-intensive livestock and feed production towards diverse 
high-value horticulture, and natural and agricultural systems are integrated. 
We suggest that this pathway would be more resilient to technological or 


in Australia through large-scale afforestation 
and renaturation programmes. In an inter- 
national carbon market, such greening pro- 
grammes can become a profitable export 
industry through the sale of carbon credits. 

The general outcome of this Australian 
assessment is in line with the findings of the 
Special Report on Emissions Scenarios pro- 
duced by the Intergovernmental Panel on 
Climate Change (IPCC)’, which concluded that 
immediate and global action to limit warming 
to 2°C by 2100, in combination with the full 
availability of key technologies, would entail 
losses in global consumption of 2-6% (median 
3.4%) in 2050 and 3-11% (median 4.8%) in 
2100. But Hatfield-Dodds and colleagues’ 
regional study argues that even Australia, 
with its high dependence on fossil-fuel and 
agricultural exports, and with high per-capita 
emissions, does not need to fear increased 
mitigation costs, because it can remain one of 
the most cost-efficient producers. 

However, although the study shows that 
Australia can reduce emissions and environ- 
mental impact while sticking to its materi- 
als-intensive production and consumption 
patterns, the authors assess only a selection 
of potential pathways (Fig. 1). Within the 
literature on future scenarios*°, the possi- 
bilities considered by Hatfield-Dodds et al. 
describe a rather optimistic future in terms 
of political institutions and technological 
performance, and envisage a society open to 
trade and migration and with materialistic life- 
styles. Ecosystem services are valued, but with 
a curative rather than a preventive approach 
to environmental damage. Focusing on this 
strand of scenarios might mask certain risks 
and opportunities. 

One such risk is that future technologies will 
perform less well than we expect them to. For 
example, the performance of carbon-capture- 
and-storage technologies and of large-scale 
afforestation enormously influence the chal- 
lenges and mitigation costs of reaching ambi- 
tious climate targets’. In a world that relies on 
resource-intensive growth, if such mitigation 
options fail, this could escalate abatement costs 
or render climate targets unachievable. 

Society might also fail to establish the 
institutional framework required to embed a 
materials- and energy-intensive economy into 
environmental systems. Such a framework 
requires not only a timely international agree- 
ment on global carbon pricing, but also the 
regulation of other indirect environmental 
costs that are not reflected by market prices 
(externalities), such as groundwater use or 
nutrient pollution. Hatfield-Dodds and col- 
leagues’ study clearly shows that, without 
such policy frameworks, problems rapidly 
emerge — for example, fast-growing forests 
planted for carbon sequestration can lead to 
extreme water scarcity in certain catchment 
areas. Other side effects could include the 
increased use of pesticides and fertilizers when 


afforestation reduces the areas available for 
crops’, or the disruption of marine ecosystems 
as a result of water desalination’. 

The study convincingly argues that lifestyle 
changes, such as reduced working time, are not 
sufficient to solve environmental problems. 
But such changes do help to relieve pressure 
in the water-energy—food-climate-biodiver- 
sity nexus'’ and might lessen the grave con- 
sequences of technological or institutional 
failure. Even in high-abatement scenarios, 
Hatfield-Dodds and colleagues estimate that 
per-capita energy demand will not fall below 
current levels, and that the global demand for 
animal products will double. Here, they may 
underestimate the potential for behavioural 
change, which was also highlighted in the 
IPCC’s Fifth Assessment Report’. 

This work reinforces the appraisal that 
global pricing of greenhouse gases is essen- 
tial to mitigate climate change effectively and 
efficiently’, and that it should be supported by 
a general regulation of environmental exter- 
nalities to avoid unwanted effects. Anchoring 
mitigation commitments in a global climate 
treaty has the capacity to protect Australia’s 
economy from unfair competition and to allow 
continued growth. 

Beyond this, this paper and other findings 
of the Australian National Outlook’ should 
trigger debate on how to shape Australia’s 
future. Continuous, resource-intensive growth 
is one possible pathway, but it will require 
powerful institutions to restrain the pressure 
on environmental systems. Another pathway 
could be an economy shaped by technology 
and labour instead of energy and resources, 
allowing less-strict regulation to keep the 
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economy within environmental boundaries. 
The structural change needed for the latter 
pathway could be initiated by investing car- 
bon-tax revenues in education and science, 
establishing markets for flexible electricity 
consumption, providing bicycle and public- 
transport infrastructure and promoting 
healthy and sustainable diets. Australia is free 
to choose which path to follow. = 
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Droplets leap into action 


What could cause a water droplet to start bouncing on a surface? It seems that a 
combination of evaporation and a highly water-repellent surface induces droplet 
bouncing when ambient pressure is reduced. SEE LETTER P.82 


DORIS VOLLMER & HANS-JURGEN BUTT 


n page 82 of this issue, Schutzius et al.’ 

report a remarkable phenomenon: at 

low pressure, droplets of water resting 

onan extremely water-repellent surface spon- 

taneously jump and bounce. In some cases, the 

height of each bounce increases, like a gymnast 

jumping ona trampoline. The findings add to 

our understanding of how droplet-surface 

interactions can prevent the accumulation of 
water or ice on surfaces. 

Ice accretion on surfaces is a big problem in 

cold regions, particularly for aviation, shipping 

or offshore industries”. Strategies to minimize 


ice adhesion include using either smooth or 
highly water-repellent (superhydrophobic) 
surfaces. Superhydrophobic surfaces are 
covered with tiny protrusions that have low 
interfacial energy, which minimizes their 
attraction to liquids. 

A water or ice droplet resting on a super- 
hydrophobic surface sits on top of the protru- 
sions, so that the main part of the droplet’s 
underside is separated from the surface’s 
substrate by a thin layer of air’ (Fig. 1). The 
small contact area between the water or ice 
and the protrusions ensures low ice adhesion. 
However, the remaining adhesion is usually 
still sufficiently strong to keep ice in place. 
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Furthermore, because the volume of water 
increases during freezing, water droplets can 
expand into the space between protrusions 
upon freezing, increasing both the contact area 
and the adhesion of the resulting ice. 

So how can low pressure cause droplets on a 
superhydrophobic surface to start trampolin- 
ing? Schutzius and co-workers propose that 
two effects need to be considered. First, as 
noted above, the surface reduces the droplets’ 
adhesion. Such low adhesion has been shown 
to cause droplet jumping when two droplets 
merge, because the adhesion energy is easily 
overcome by the surface energy that is released 
by the merging* (surface energy quantifies 
the disruption of intermolecular bonds that 
occurs in a liquid when a surface is formed). 
The second effect is evaporation. When 
water evaporates in still air, the rate of evapora- 
tion is limited by the ability of the water vapour 
to diffuse. Reducing the pressure of the sur- 
rounding gas increases the diffusion, and thus 
the rate of evaporation. 

In Schutzius and colleagues’ study, gas 
and water vapour are rapidly pumped away 
from the experimental chamber. A film of 
water vapour therefore remains only in the 
gap between the droplets’ underside and the 
surface substrate, because the water vapout’s 
escape from this region is inefficient. An 
overpressure therefore builds up in the gap — 
that is, the pressure in the gap becomes higher 
than that of the surrounding atmosphere. 

The authors argue that the droplet jumps once 
the force induced by the overpressure on the 
droplet overcomes gravity and adhesion. The 
gravitational force on droplets of 1 millimetre 
radius is about ten times lower than the adhesive 
force, so less than 10% of the total overpressure 
needed to cause jumping is used to overcome 
gravity. But, gravity is, of course, required 
for the droplet to fall back to the surface. 

When droplets land back on the surface, 
they spread and their kinetic energy is trans- 
formed into surface energy. This spreading is 
followed by retraction into an almost spheri- 
cal droplet, during which the surface energy is 
transformed back into kinetic energy and the 
droplet bounces up again. For millimetre-sized 
droplets, spreading and retraction take several 
milliseconds”*. 

By calculating the volume of water vapour 
that can pass through the gap between the 
underside of the droplet and the surface’s sub- 
strate per unit of time, the authors show that 
overpressure builds up beneath the droplet 
about ten times faster than the typical con- 
tact time of a droplet with the surface. The 
overpressure induces an upward force on the 
droplet that adds to the force caused by the 
conversion of surface energy to kinetic energy 
when the droplet retracts. This additional 
force can increase the height of the droplet’s 
bounces, until a maximum height is reached 
after a few rebounds. 

At sufficiently low ambient pressure, the 


Water 
— droplet 


Superhydrophobic | 
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Water molecule 


Figure 1 | Mechanism of droplet trampolining. Schutzius et al.’ report that, in a low-pressure 
environment, water or ice droplets placed on superhydrophobic surfaces (which are covered with 
micrometre-sized hydrophobic protrusions) can spontaneously jump and bounce. The authors propose 
that, when a droplet is in contact with the surface, water-vapour molecules from the droplet escape more 
slowly from the gap beneath the droplet’s underside than they do from elsewhere. The pressure in the 
gap therefore becomes larger than ambient pressure, generating a force (arrows) that lifts the droplet up. 


(Inset adapted from ref. 9.) 


temperature in the droplet can fall below its 
freezing point because of cooling caused by 
evaporation’. Schutzius et al. report that jump- 
ing can also be triggered by freezing of such 
supercooled water droplets — the latent heat 
released on freezing causes a sudden over- 
pressure and the droplet jumps off the substrate. 

Droplet trampolining resembles the Leiden- 
frost phenomenon, which can be observed 
when water is spilt on a hot pan. A liquid drop- 
let in close contact with a hot, solid surface 
gives rise to a vapour layer beneath the droplet; 
this vapour keeps the liquid from making direct 
physical contact with the surface. Typically, the 
droplet immediately starts to hover and move 
around. By contrast, the onset of trampolin- 
ing can be fine-tuned by adjusting the time at 
which the system is depressurized. Another 
difference is that the Leidenfrost effect is caused 
by an imposed temperature difference between 
the droplet and surface, whereas droplet 
trampolining is caused bya pressure difference 
generated by the droplet itself. 

Inertia and viscous dissipation (the con- 
version of a fluid’s surface and kinetic energy 
into internal energy) typically dominate the 
rebound ofa droplet from a superhydrophobic 
surface. By contrast, trampolining results from 
a uniformly increasing force acting on the 
droplet’s lower surface. 

A force also acts on a droplet’s lower surface 
during pancake bouncing* — a phenomenon 
that occurs when droplets collide with super- 
hydrophobic surfaces made from an array of 
submillimetre-spaced, tapered protrusions. 
During pancake bouncing, droplets hitting 
the surface penetrate substantially into the 
array, whereby kinetic energy is transferred 
to interfacial energy. This process is followed 
by upward motion of the droplet out of the 
array through capillary action, during which 
the interfacial energy is transformed back into 
kinetic energy. The droplets then bounce off 
the surface in a pancake-like shape. 

Both the trampolining and pancake- 
bouncing mechanisms reduce the contact time 
of bouncing droplets compared with bouncing 


42 | NATURE | VOL 527 | 5 NOVEMBER 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


on anormal surface’. However, unlike pancake 
bouncing, droplet trampolining is expected 
to occur for a large variety of surface topo- 
graphies, as long as the gap beneath the drop- 
let is kept thin (at least 100 times less than 
the droplet diameter at a pressure of about 
0.05 bar). If the gap is too large, water vapour 
would escape too quickly to have an effect and 
the overpressure in the textured surface would 
not be high enough. 

Although Schutzius and colleagues’ obser- 
vations are fascinating, reducing atmospheric 
pressure is not a practical way of preventing 
icing in outdoor areas. And even for smaller 
areas, much energy is consumed in reducing 
the ambient pressure. Furthermore, evapora- 
tion eventually causes the droplets to become 
so small that they come to rest — although 
bouncing has not been maintained indefinitely 
in any other drop-impact experiments. 

Nevertheless, the authors have vividly illus- 
trated that simple experiments can yield sur- 
prising results. Applying underpressure to a 
system is the most common way to enhance 
evaporation, and is often used in chemical 
and technical laboratories. Who would have 
guessed that it could produce such spectacular 
dynamics? = 
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Light on leptin 
link to lipolysis 


Cutting-edge experiments show that the hormone leptin, which is secreted by 
fat cells, promotes fat loss by activating the release of catecholamine signalling 
molecules from neurons wrapped around the fat cells. 


JOHAN RUUD & JENS C. BRUNING 


nyone wanting to lose a few extra 
Ave might well wish that fat could 

be burnt at the flick of a switch. As 
Zeng et al.’ report in Cell, they have achieved 
just that in mice. In doing so, they reveal clues 
to the mechanism by which the hormone 
leptin promotes fat loss in mammals. 

One of the main functions of one type of fat, 
white adipose tissue (WAT), is to store lipids. 
WAT is also the primary source of leptin, 
which is secreted in response to lipid storage 
and acts in the brain to reduce body-fat mass”. 
Although many experiments* have shown that 
leptin activates lipolysis (lipid breakdown), the 
mechanisms that underlie this feedback loop 
are less well defined. In particular, although 
lipolysis is thought to be under tight control of 
the brain and the peripheral nervous system’, 
several key questions remain unanswered. For 
example, does WAT receive bona fide innerva- 
tion from the autonomic nervous system (the 
part of the peripheral nervous system that 
regulates day-to-day organ function)? And 
how are fat depots slimmed down when the 
brain is instructed that fat stores are more than 
sufficient? 
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Zeng and colleagues used state-of-the-art 
techniques to investigate whether the lipolytic 
effect of leptin is mediated by the autonomic 
nervous system. Technical innovations”® now 
allow researchers to clear intact organs of 
lipids, making the organs more transparent 
and thus amenable to visualization by micro- 
scopy. The authors exploited this advance 
to clear mouse-derived inguinal fat pads 
(masses of closely packed fat cells close to the 
hind leg), and then used sophisticated imag- 
ing techniques to reconstruct 3D anatomical 
pictures of the entire tissue’. This reconstruc- 
tion revealed that thick bundles of neuronal 
projections called axons cover the surface of 
the fat pad. 

The researchers found that these bundles 
belong to the sympathetic nervous system — 
the part of the autonomic nervous system that 
stimulates the fight-or-flight response, and 
that is responsible for accelerating heart rate, 
dilating pupils and activating sweat secretion. 
Indeed, many of the bundles expressed the 
enzyme tyrosine hydroxylase, which helps 
to synthesize catecholamine molecules such 
as noradrenaline that act as neurotransmit- 
ters in the sympathetic nervous system. Zeng 
and colleagues also showed in vivo that fat 
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Figure 1 | Sympathetic to fat loss. The hormone leptin is secreted from fat tissue called white adipose 
tissue (WAT) in response to lipid storage. Zeng et al.' report that, in mice, leptin acts in the brain, 
triggering signals that activate ganglionic neurons of the sympathetic nervous system whose projections 
(called axons) wrap around fat cells. The neurons release the neurotransmitter molecule noradrenaline, 
which signals to B-adrenoceptor proteins on the fat cells. This promotes phosphorylation (p) of the 
enzyme hormone-sensitive lipase (HSL), triggering lipolysis (lipid breakdown) and so fat loss. 
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cells were located close to nerve fibres that 
expressed tyrosine hydroxylase. Fat pads were 
not analysed using electron microscopy, which 
could have verified whether sympathetic neu- 
rons terminate on fat-cell membranes. But 
these data nonetheless indicate that tyrosine- 
hydroxylase-expressing axonal projections 
make contact with some fat cells. 

Next, Zeng et al. investigated the relation- 
ship between activation of the axon bundles 
and fat-cell metabolism using optogenet- 
ics — a revolutionary technique in which 
light-sensitive ion-channel proteins are selec- 
tively expressed in certain neurons and acti- 
vate those neurons when exposed to light’. 
Although the technique is commonly used on 
the brain, optogenetic experiments on other 
tissues are often hindered by the fact that neu- 
rons outside the brain can have long axons; 
this means that high levels of light-controlled 
ion-channel-protein expression are required 
to drive photoactivation of the distant projec- 
tions’. Exacerbating this problem, the axons 
that innervate the inguinal fat pad originate in 
clusters of neuronal cell bodies that are almost 
impossible to access for precise, chronic light 
stimulation. 

The authors overcame these technical 
challenges by using genetic techniques to 
specifically target sympathetic axons, and 
locally modulated the activity of axons inner- 
vating the fat pad. As a compelling verifica- 
tion of the method's effectiveness, illuminating 
the inguinal fat had the same effect as treating 
mice with leptin — levels of noradrenaline 
increased, as did phosphorylation (an activat- 
ing molecular modification) of hormone-sen- 
sitive lipase (HSL), an enzyme that the authors 
used as a measure of leptin-elicited lipolysis. 
Daily optogenetic activation of axons over 
several weeks reduced fat mass. Conversely, 
disrupting neuronal input to the fat pad 
genetically, surgically or pharmacologically 
almost completely blocked leptin-evoked HSL 
phosphorylation. This indicates clearly that 
leptin-triggered lipolysis depends on activa- 
tion of the sympathetic neurons that project 
to fat (Fig. 1). 

To investigate the molecular mechanism 
underlying this response to leptin, Zeng 
et al. analysed genetically engineered mice in 
which catecholamine signalling was blocked. 
The mice lacked either an enzyme involved 
in noradrenaline synthesis or isoforms of 
noradrenaline-receptor proteins called 
B-adrenoceptors, which are expressed by fat 
cells. Although leptin treatment resulted in 
phosphorylated HSL and fat loss in wild-type 
mice, these effects were attenuated in both 
types of mutant. Notably, mice lacking the 
B-adrenoceptor isoforms 61 and B2 showed 
more lipase phosphorylation and whole-body 
fat loss than those lacking 61, 82 and B3, con- 
sistent with a study’ that pointed to a domi- 
nant role for 83 receptors in lipolysis. 

Finding that sympathetic neurons innervate 
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WAT and mediate leptin-stimulated lipolysis is 
not surprising. However, Zeng and colleagues’ 
study fills a gap in our understanding of pre- 
cisely how organisms respond to an abundance 
of leptin. Their work also specifically demon- 
strates that sympathetic neurons projecting to 
WAT are a central trigger for leptin-mediated 
lipolysis. 

Of course, questions arise from these find- 
ings. Leptin is thought to signal through 
several brain areas"', but it remains unclear 
which neuronal networks sense increased 
blood leptin concentrations and control sym- 
pathetic relay stations to ultimately regulate 
lipolysis and fat mass. Notably, only half of the 
nerve fibres found in WAT expressed tyrosine 
hydroxylase, and the authors did not analyse 
the other half, nor the characteristics of the 
fat cells that the neurons innervate. Although 
their identities remain elusive, these neurons 
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and fat cells hold the potential for further 
exciting discoveries. Future experiments 
should define the key brain areas that control 
sympathetic traffic to WAT and the molecular 
circuitry that controls lipolysis downstream of 
these effectors. 

Zeng et al. estimated that tyrosine- 
hydroxylase-expressing neurons envelop 
between 3 and 12% of fat cells, a relatively 
sparse coverage. Nonetheless, the fact that 
optogenetic activation markedly increased 
lipolysis indicates that catecholamine signal- 
ling through neuro-adipose junctions has an 
important role in the control of lipid homeo- 
stasis. Given that leptin resistance is a com- 
mon feature of obesity, it is to be hoped that 
this study will fuel further dissections of the 
brain-fat axis. It might also open a door to 
assessing the therapeutic potential of control- 
ling catecholamine signalling in fat. = 


Electrical signalling 
goes bacterial 


The discovery that potassium ion channels are involved in electrical signalling 
between bacterial cells may help to unravel the role of ion channels in microbial 
physiology and communication. SEE ARTICLE P.59 


SARAH D. BEAGLE & STEVE W. LOCKLESS 


iological membranes separate cells or 
B cellular compartments from the rest of 

the world, protecting the internal con- 
tents from the sometimes hostile, and always 
different, external milieu. However, cells are 
not closed systems and must pass informa- 
tion and matter, including ions, selectively 
across the membrane barrier. Proteins called 
ion channels facilitate the movement of ions 
across the membrane by allowing each ion 
to flow passively down its electrochemical 
gradient. Although ion channels mediate 
rapid, long-range communication in eukary- 
otes (the group of organisms that includes 
plants, animals and fungi), a signalling role 
for bacterial ion channels has remained elu- 
sive’’. In this issue, Prindle et al.’ (page 59) 
report the first example of a bacterial 
potassium channel that functions in a signal- 
ling role, through long-range coordination of 
metabolic oscillations. 

The current study is an extension of the 
same laboratory's previous discovery’ that 
adherent communities of Bacillus subtilis 
bacteria, known as biofilms, grow in periodic 
cycles once the colony reaches a threshold 
size (Fig. 1). The authors proposed that these 
oscillations arise when the cells in the biofilm’s 


interior become deprived of glutamate, owing 
to high consumption of the amino acid by 
peripheral cells. Glutamate starvation in the 
interior cells reduces their production of 
ammonium ions, which the peripheral cells 
need, resulting in arrested cell growth in the 
periphery. Following replenishment of gluta- 
mate in the interior cells, ammonium produc- 
tion increases, leading to growth of peripheral 
cells. The linked metabolic processes of cells 
within the biofilm community raised the ques- 
tion of how the metabolic state of cells is com- 
municated over long distances. 

Maintenance of the proper intracellular 
concentrations of glutamate and ammonium 
depends on the electrical potential across the 
cell membrane”’, known as the membrane 
potential. Therefore, Prindle et al. investigated 
whether electrical signalling is responsible for 
the long-range coordination of metabolic 
oscillations across the bacterial population. 
Using a voltage-sensitive fluorescent dye, 
the authors detected rhythmic synchronized 
fluctuations in membrane potential across the 
biofilm. Eliminating the need for glutamate 
and ammonium by adding the amino acid glu- 
tamine to the cells’ growth medium quenched 
these fluctuations, thereby linking electrical 
signalling and metabolism. 

The observed changes in the membrane 
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potential could result from sodium ions (Na*) 
moving into the cell, potassium ions (K"*) 
moving out, or a combination of both. Using 
fluorescent dyes that specifically bind to either 
Na‘ or K‘, the researchers found a direct cor- 
relation between the timing of K* efflux and 
changes in membrane potential, suggesting 
that K* efflux might propagate signals across 
the biofilm. 

Because the K* channel YugO is involved in 
B. subtilis biofilm formation’, Prindle et al. next 
asked whether this channel mediates K* efflux. 
As expected, glutamate limitation in wild-type 
cells led to K* efflux, whereas no K* efflux was 
observed in cells lacking the yugO gene. Simi- 
larly, deletion of the TrkA domain of YugO, 
which gates K* flux, decreased the propaga- 
tion of electrical oscillations under limited glu- 
tamate conditions. These results indicate that 
YugO is activated by glutamate limitation and 
is required to propagate the K* signal through 
the biofilm (Fig. 1). The use of extracellular K* 
to propagate a metabolic stress signal through 
the bacterial community is reminiscent of the 
increase in extracellular K* that drives the dila- 
tion of blood vessels in the mammalian brain* 
in response to stress, suggesting that some 
K* channels in bacteria and eukaryotes have 
evolved to accomplish similar outcomes. 

Prindle and colleagues’ study establishes 
the first example of a signalling function for a 
bacterial K* channel. Although previous stud- 
ies”’° have established a role for various classes 
of bacterial channel in regulating cellular 
osmotic pressure, the impressive evolution- 
ary conservation of eukaryotic and bacterial 
channels at the protein-sequence and struc- 
tural levels provides additional evidence that 
some bacterial ion channels probably have 
signalling roles''”’. It is notable that the first 
demonstration of a signalling role for bac- 
terial ion channels occurs in the context of 
bacteria acting as multicellular entities, and 
that this function serves to coordinate the 
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Figure 1 | Shocking communication. Bacillus subtilis bacteria can form communities called biofilms, 
in which cells both in the interior and on the periphery require the amino acid glutamate to survive and 
grow. a, When peripheral cells take up most of the available glutamate, the interior cells become starved. 
Prindle et al.° propose that nutrient-stressed interior cells secrete potassium ions (K*) through the YugO 
K* channel. b, The release of K* ions then changes the transmembrane voltage of cells and leads to the 
subsequent release of K* ions from neighbouring cells, propagating the starvation signal. c, The signal 
propagation ultimately reduces the uptake of glutamate in peripheral cells. Glutamate becomes available 


for interior cells to consume and the cycle is reset. 


metabolic states of neighbouring cells. 

Unlike a eukaryotic action potential (in 
which electrical signal propagation is fast, 
owing to the rapid rising and falling of the 
membrane potential), the signalling that 
coordinates metabolic oscillations in B. sub- 
tilis occurs over a longer time period. Using 
mathematical modelling, the authors provide 
evidence that K" efflux alone can account for 
the slow signal propagation. This propagation, 
which is perhaps an evolutionary precursor 
to the faster action potential, seems to retain 
overall biofilm stability by synchronizing the 
growth of peripheral cells and metabolic main- 
tenance of interior cells. 

Although the present study highlights 
similar signalling roles for eukaryotic and bac- 
terial ion channels, many questions remain to 
be addressed. What are the metabolic inter- 
mediates that activate YugO following gluta- 
mate starvation in B. subtilis? One possibility 
is that the TrkA regulatory domain senses 
the energy level of the cell by binding ATP or 
ADP, two molecules that have been shown" 
to regulate the TrkA protein in other bacteria. 
What is the magnitude of the changes in mem- 
brane potential, and how does this affect other 
voltage-dependent processes in the mem- 
brane? More generally, it will be interesting to 
determine whether this mechanism is used by 
other community-forming species as a way to 
regulate metabolism and growth. 

The discovery of this K* signalling mecha- 
nism highlights the complexities of bacterial 
social communication. Most bacterial cells are 


too small to require electrically propagated 
intracellular signalling; instead, diffusion of 
signalling molecules in the cytoplasm is suf- 
ficiently rapid. It remains to be seen whether 
other bacterial social behaviours are governed 
by electrical signalling. Perhaps such signal- 
ling plays a part in interspecies communi- 
cation, for instance between biofilms and 
epithelial cells in the gut. Like predator-prey 
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interactions in some of the more complex 
eukaryotic species, it could be that microor- 
ganisms compete with each other by secret- 
ing toxins that interfere with an adversary’s 
ion-channel activities. Finally, the fact that 
signalling in the biofilm shares several char- 
acteristics with electrical signalling in the 
nervous system — including the use of the 
common neurotransmitter molecule glu- 
tamate — highlights an exciting functional 
connection between these evolutionarily 
distant systems. m 
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Quantum sound 
waves stick together 


Asensitive cold-ion experiment probes sound at the level of phonons, the 
fundamental quantum units of vibration. It shows that phonons mix in such a way 
that they can be classified as ‘bosonic’ particles, like photons. SEE LETTER P.74 


DAVE KIELPINSKI 


he phenomenon of wave interference is 
observed in various settings, including 
optics, electronics and acoustics. In con- 
structive interference, the crests and troughs of 
interfering waves reinforce each other, whereas 
in destructive interference they cancel each 
other out. Although we think of sound as con- 
sisting of macroscopic waves, it has a quantum 


nature. The energy of a sound wave is an integer 
multiple of a fundamental quantum of vibra- 
tional energy called a phonon. On page 74 of 
this issue, Toyoda et al.' report the effect of 
two-phonon interference, and show that the 
interfering phonons ‘stick together’ — they are 
never observed to go different ways. 

The interference of sound waves is not 
just of academic interest. For instance, it is 
the operating principle of noise-cancelling 
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headphones. These create their own sound 
vibrations, which are tuned to destructively 
interfere with external vibrations. The two 
vibrations cancel at the ear, and so no sound is 
heard. By contrast, they constructively inter- 
fere at other locations, away from the ear. 

Ona smaller scale, a quantum-mechanics 
principle dictates that when the number of 
phonons (or particles such as photons or 
electrons) is accurately known for a system, 
the locations of the crests and troughs of the 
waves associated with these particles cannot 
be known with certainty. However, one can 
still perform an interference experiment to 
see what happens. 

In 1987, the physicists Hong, Ou and 
Mandel demonstrated’ that, surprisingly, the 
interference of two photons is either com- 
pletely constructive or completely destruc- 
tive, and that the two possibilities coexist until 
the result of the experiment is observed. Two 
detectors that register photons at two output 
ports always measure zero photons at one port 
and two photons at the other. This is known as 
the Hong-Ou-Mandel effect. By contrast, in 
the case of electrons, it has been shown’ that 
two electrons will never register at the same 
port — they always go their separate ways. 

Toyoda and colleagues use highly sophis- 
ticated experimental-physics techniques to 
probe sound at the level of individual phon- 
ons. At room temperature, the atoms in matter 
show random thermally driven vibrations that 
act as background ‘noise; overwhelming the 
quantum effects of sound. Only matter that has 
been cooled to near absolute zero temperature 
displays sufficiently small thermal vibrations 
to allow such effects to be measured. 

To perform these measurements, the 
authors use two calcium ions that have been 
electromagnetically trapped in a chamber 
under ultrahigh-vacuum conditions. Heat 
cannot reach the ions because they are not in 
contact with the chamber’s walls and there is 
no gas in the chamber to transfer it. Toyoda 
et al. suppress the ions’ residual thermal vibra- 
tions using a technique known as laser cooling, 
allowing the ions’ quantized vibrations (the 
phonons) to be revealed. By applying appro- 
priately tuned laser pulses to the ions, they can 
then add or remove vibrations, one phonon at 
atime. A follow-up sequence of pulses causes 
the ions to fluoresce only if they are vibrat- 
ing, and the authors use the detected fluores- 
cence as an optical marker for sound at the 
quantum level. 

To observe the interference of two phonons, 
Toyoda et al. start by ‘feeding one phonon to 
each ion. Because the two ions are positively 
charged, they repel each other, so that when 
one vibrates, the other one gently wiggles. This 
wiggling effect causes a phonon that starts out 
on its own ion to mix slowly with the other ion’s 
phonon, and so to interfere with it. The authors 
observe that, almost always, both phonons 
end up on the same atom, but which one? The 


phonons dont care. In this situation, quantum 
mechanics predicts that both phonons reside 
together on one atom, and at the same time, 
both reside on the other atom — at least, as 
long as no one measures the location of the 
phonons. By adding an extra interference step 
to the experiment, the authors obtain substan- 
tial evidence that the phonons can, in fact, 
seem to be in both places at once. 


The authors’ 
The effects results are a crucial 
reported by test of the quantum 
the authors theory of sound, and 
might be used definitively prove 
in the quantum that phonons are bos- 
engineering of ons rather than fer- 
acoustic devices. mions (bosons, such 


as photons, are parti- 
cles that have integer spin angular momentum, 
whereas fermions, such as electrons, have half- 
integer spin). Every quantum system falls into 
one of these two categories, and this classifica- 
tion has physical ramifications. For instance, 
laser-like wave emission commonly occurs in 
systems that have a large number of bosons — 
it has been observed for phonons in trapped- 
ion experiments’ and in microscopic devices 
known as toroidal resonators’. 
In optical systems, the Hong-Ou-Mandel 
effect powers applications such as quantum 


GENE REGULATION 


computing, simulation and sensing®. The 
current work indicates that phononic systems 
could also be suitable for quantum-enhanced 
applications. Nanometre-scale mechanical 
systems, although more vulnerable to thermal 
noise than trapped ions, offer a wider field of 
potential quantum phononics applications 
because they can operate at room temperature 
and under atmospheric pressure. Recently’, it 
has become possible to control and measure 
single phonons in nanomechanical systems — 
interference experiments may soon follow. As 
these systems become increasingly complex, the 
effects reported by Toyoda et al. might be used 
in the quantum engineering of acoustic devices 
and circuits. m 


Dave Kielpinski is at Hewlett Packard 
Laboratories, Palo Alto, California 94304, USA. 
e-mail: david.kielpinski@hpe.com 
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Expression feels 


two pulses 


Single-cell analyses reveal that combinatorial changes in the intracellular 
locations of transcription factors can tune the expression of the factors’ target 
genes in response to environmental stimuli. SEE ARTICLE P.54 


ANTOINE BAUDRIMONT & ATTILA BECSKEI 


ost transcription factors exert their 
Mee continuously, but some act 

in pulses by moving rapidly in 
and out of the nucleus. Transmitting cellular 
signalling-pathway information in pulses or 
oscillations has several advantages over con- 
tinuous signalling. For example, information 
can be encoded in the frequency or amplitude 
of pulsing, boosting the amount of information 
transmitted. Investigating this phenomenon 
has proved difficult, however, because the 
behaviour of pulsatile transcription factors 
varies greatly from cell to cell’. In this issue, 
Lin et al.’ (page 54) overcome this hurdle and 
demonstrate that combinations of transcrip- 
tion-factor pulses that change in response 
to environmental stimuli can regulate gene 
expression. 
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The Msn2 protein, which is expressed in the 
budding yeast Saccharomyces cerevisiae, was 
the first transcription factor to be identified 
as pulsatile, moving to the nucleus to activate 
transcription when cells are exposed to light’. 
Although such pulsatile signalling patterns 
can be advantageous, they are also prone to 
disruption, because the message transmit- 
ted varies as time passes. In electronics, such 
problems are typically solved by ensuring that 
more than one component can perform the 
same task. Theoretically, the same principles 
apply to cell signalling — propagating signal- 
ling pulses through multiple pathways that 
are then reintegrated is predicted to improve 
reliability’. Indeed, Msn2 is known‘ to act with 
the pulsatile transcriptional repressor protein 
Mig] to control gene expression in response to 
various stresses. 

Lin et al. analysed the dynamics of Msn2 


and Mig1 pulses by generating strains of 
S. cerevisiae in which the two transcription 
factors were tagged by different fluorescent 
proteins, allowing their intracellular locations 
to be tracked. The authors attached these cells 
to a microfluidic device through which cell- 
growth media were passed, and monitored 
transcription-factor movements as well as 
any subsequent changes in the transcription 
of genes whose expression is regulated by 
both factors. 

Depleting glucose in the cell media triggered 
the export of Mig] from the nucleus and the 
import of Msnz2, increasing the expression of 
target genes. If Msn2 and Mig] acted accord- 
ing to a simple continuous regulatory scheme, 
or if they were pulsatile but the timing of pulses 
was completely random, then glucose deple- 
tion would gradually alter the average level of 
each transcription factor in the nucleus across 
the population of cells, and the expression 
of target genes would gradually increase to a 
new steady-state level. Instead, however, the 
authors observed a ‘transient phase’ imme- 
diately after glucose depletion, during which 
the average nuclear levels of Msn2 and Mig] 
were higher and lower, respectively, than when 
they subsequently reached steady-state levels 
(Fig. 1a). An overshoot such as this is often 
observed when systems adapt to change’®, and 
it can decrease the time it takes for target- 
gene expression levels to reach the new steady 
state. Indeed, a kinetically similar response is 
known to occur’ when glucose concentration 
increases: target-gene expression is repressed 
by Mig], lowering levels of the corresponding 
RNA transcript, but a transient destabilization 
of the transcript helps to speed up the process 
by promoting transcript degradation. 

Lin and colleagues hypothesized that the 
overshoot they observed was not just a tran- 
sient event, but might persist in steady-state 
conditions in the form of pulses that could 
not be observed on a population-wide level 
because their effects averaged out. When ana- 
lysing single cells, however, the authors found 
that the pulsing of each transcription factor 
was sporadic and irregular under steady-state 
environmental conditions, making it hard to 
define individual pulses. In principle, many 
criteria could be used to define such pulses, but 
most would be of little practical relevance. The 
authors developed an interesting and prag- 
matic approach to detecting individual pulses, 
based on a neuroscience technique called 
spike-triggered averaging®. In their adapted 
version, which the authors dubbed pulse- 
triggered averaging, pulses were measured as 
averages that were based on the dynamics of 
Mig] and Msn2 over a set time period around 
peaks in nuclear Msn2 levels. 

Justifying their approach, the authors 
demonstrated that a target gene responded to 
changes in Msn2 or Mig] that the technique 
registered as pulses. For instance, Msn2 pulses 
were followed by elevated gene expression if 
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Figure 1 | Interpreting transcription-factor pulses. The transcription factors Msn2 and Mig] enter 
the nucleus in pulses to respectively activate and repress transcription of the same target genes. a, Lin 

et al.’ report changes in transcription-factor pulsing in response to environmental stimuli. A decrease in 
glucose concentration in the medium around the cell causes a large transient decrease in the level of Mig] 
in the nucleus, and a rapid increase in Msn2. This ‘overshoot allows cells to adapt to change by promptly 
increasing gene expression. After this transient phase, nuclear levels of each factor, and hence gene 
expression, remain steady on a population-wide level. b, The authors show that it is only in the transient 
phase that all cells display synchronized non-overlapping pulses. After this phase, levels of each factor 
pulse randomly in single cells. When pulses of both factors overlap in the nucleus, gene expression falls. 
However, when pulses of Msn2 do not overlap with Mig1, expression increases. 


there was no overlap with a Mig] pulse (that 
is, if Mig] was not in the nucleus at the same 
time). Conversely, there was a decrease in 
target-gene expression if the Msn2 pulse was 
counteracted by an overlapping Mig] pulse. 
These observations confirm that the tran- 
scription-factor pulses do indeed persist under 
steady-state conditions (Fig. 1b). 

At elevated steady-state glucose concentra- 
tions, the percentage of overlapping pulses 
increased to a level beyond that expected by 
chance, enhancing the efficiency with which 
target-gene expression was repressed. By con- 
trast, during the transient phase, Lin and col- 
leagues detected only non-overlapping pulses, 
suggesting that during this period the timing 
of pulses is modulated. In this way, pulses are 
synchronized between cells, resulting in an 
overshoot at the population-wide level. 

Pulse-triggered averaging could now 
become a powerful tool for analysing other 
regular oscillating reactions in cells. So far, 
most studies have focused on average cell 
behaviour, but cell-cycle checkpoints, for 
instance, elicit single-cell responses with 
considerable cell-to-cell variability’. Pulse- 
triggered averaging may help to disentangle 
the underlying regulatory interactions. 

Moreover, the authors’ approach makes it 
possible to analyse gene regulation without 
understanding all of a system’s parameters. 
For instance, the current study showed not 
only that a fully overlapping repressor pulse 


can neutralize an activating pulse, but also 
that the same neutralization can occur when 
the two pulses are separated by a few min- 
utes, without needing to understand the root 
causes. It is important to note that not all Msn2 
pulses elicited target-gene expression, even 
when Mig] activity was low. This provides a 
reminder that mass-action kinetics, stochas- 
tic modelling and identification of reaction 
mechanisms must be included in complete 
models of gene regulation. Research into these 
topics has undergone marked development in 
recent years, and may soon converge, making 
it possible to understand the dynamics of 
signalling pathways in detail. m 


Antoine Baudrimont and Attila Becskei 
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Australia is ‘free to choose’ economic 
growth and falling environmental 
pressures 


Steve Hatfield-Dodds!, Heinz Schandl', Philip D. Adams’, Timothy M. Baynes?, Thomas S. Brinsmead‘, Brett A. Bryan®, 
Francis H. S. Chiew!, Paul W. Graham*, Mike Grundy°, Tom Harwood!, Rebecca McCallum!, Rod McCrea’, Lisa E. McKellar’, 
David Newth®, Martin Nolan®, Ian Prosser!} & Alex Wonhas? 


Over two centuries of economic growth have put undeniable pressure on the ecological systems that underpin human 
well-being. While it is agreed that these pressures are increasing, views divide on how they may be alleviated. Some 
suggest technological advances will automatically keep us from transgressing key environmental thresholds; others that 
policy reform can reconcile economic and ecological goals; while a third school argues that only a fundamental shift in 
societal values can keep human demands within the Earth’s ecological limits. Here we use novel integrated analysis of the 
energy-water-food nexus, rural land use (including biodiversity), material flows and climate change to explore whether 
mounting ecological pressures in Australia can be reversed, while the population grows and living standards improve. 
We show that, in the right circumstances, economic and environmental outcomes can be decoupled. Although economic 
growth is strong across all scenarios, environmental performance varies widely: pressures are projected to more than 
double, stabilize or fall markedly by 2050. However, we find no evidence that decoupling will occur automatically. Nor 
do we find that a shift in societal values is required. Rather, extensions of current policies that mobilize technology and 
incentivize reduced pressure account for the majority of differences in environmental performance. Our results show 


that Australia can make great progress towards sustainable prosperity, if it chooses to do so. 


Our analysis uses a new integrated multi-model framework developed 
for the Australian National Outlook!. Australia is globally relevant: a 
major exporter of energy, mineral and agricultural products, with high 
per capita income, greenhouse gas emissions, water extractions, and 
habitat loss. The framework assesses energy—water-food interactions 
(and links to ecosystem services) in the context of climate change’, and 
uses more than 20 scenarios to explore a diverse range of factors shaping 
future Australian economic and environmental outcomes”. Interacting 
national trends and policies include energy and resource efficiency, 
agricultural productivity, consumption and working hours, and new 
land-sector markets for energy feed-stocks and ecosystem services 
(carbon sequestration and biodiversity conservation). These are mod- 
elled against four levels of national and global greenhouse gas emissions 
reduction effort (from no abatement to very strong abatement), and 
associated global climate trajectories (see Extended Data Fig. 9). As well 
as assessing the range of scenario outcomes, we identify the relative con- 
tributions of different types of choices. ‘Collective choices’ are defined 
as decisions that can only be implemented by groups of actors, and 
then constrain or empower ‘individual choices’ (particularly through 
changing rules and institutions). For example, individual choices about 
whether to drive or catch a train to work are strongly shaped by prior 
collective choices about transport infrastructure. 

The framework accounts for detailed interactions across sec- 
tors and spatial scales. The focal scale is national (the continent of 
Australia), accounting for key processes at higher (global) and lower 
(sub-national) spatial scales. This cross-domain integrated approach 
is needed because partial assessments may not account for constraints 
or adverse impacts that would undermine an otherwise ‘sustainable’ 


trajectory*®. The projections and indicators are fully consistent with 
the international System of National Accounts’. We provide more 
details in the Supplementary Methods (section ‘Overview of modelling 
framework and scenarios’) and results for more than 60 national and 
global indicators in the Supplementary Data. 

Novel aspects of the analysis include assessing the potential for mar- 
kets for ecosystem services to supply carbon sequestration and habitat 
restoration (and implications for agricultural output’ and extinction 
risk)!01, assessing future water stress rather than simple volume of 
water extracted”!?; exploring material extractions and environmental 
footprints’; and integrating these elements with established models 
for analysing energy, greenhouse gas emissions and economic perfor- 
mance”!*-!”, We are not aware of any other future-looking modelling 
that integrates this range of issues and indicators (Supplementary 
Methods, ‘Overview of modelling framework and scenarios’). 


Economic and physical decoupling is possible 

We find that substantial economic and physical decoupling is possible'®. 
Economically, Australia can achieve strong economic growth to 2050, 
indicated by rising gross domestic product (GDP) and gross national 
income (GNI) per capita, in scenarios where environmental pressures 
fall or are stable. Physically, we find the services derived from natural 
resources (energy (Extended Data Fig. 2), water (Extended Data Fig. 3), 
food (Extended Data Fig. 4)) can increase, while associated environ- 
mental pressures ease (greenhouse emissions (Extended Data Fig. 6), 
water stress (Extended Data Fig. 3), native habitat loss (Extended Data 
Fig. 5)). Importantly, these projected decouplings do not involve a 
reduction in the value of Australia’s heavy industry (Extended Data 
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Energy Centre, Mayfield West, NSW 2304, Australia. CSIRO, Waite Campus, Urrbrae, SA 5064, Australia. CSIRO, Queensland Biosciences Precinct, St Lucia, QLD 4067, Australia. 7CSIRO, Ecosciences 
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Figure 1 | Economic activity (GDP) and national income (GNI) continue 
to rise strongly in all scenarios. Projections for 20 scenarios. GDP 
measures the market value of goods and services produced. GNI here 
measures payments to national residents from domestic production (as 
foreign production is not modelled). All values are in real 2010 Australian 
dollars, adjusted for inflation; one trillion is defined as 1 x 10!*. Neither GDP 
or GNI is adjusted for changes in asset values, such as depreciation or the 
depletion of stocks of natural resources, and so do not measure pure income. 
More information on models and scenarios is provided in Supplementary 
Methods, ‘Overview of modelling framework and scenarios. Sources: 
Supplementary Data worksheets la and Ic. 


Fig. 1g), or outsourcing its environmental footprint to other nations’*"”. 


Instead energy- and material-intensive sectors are projected to increase 
their share of economic activity, even in scenarios with the strongest 
global abatement efforts’”. 

In all scenarios, Australia’s economy and living standards are pro- 
jected to grow strongly (see Extended Data Fig. 1). As shown in Fig. 1, 
the value of economic activity (GDP) is projected to rise tenfold 
over the 80 years to 2050, driven by a 2.9-fold increase in population 
(Extended Data Fig. 8) and a 3.2-3.6-fold increase in GDP per capita 
(all values are in real 2010 Australian dollars, adjusted for inflation). 
National income (GNI) grows at a similar rate as GDP, with GNI per 
capita increasing by 58-82% from 2010 to 2050. Around two-thirds of 
the range of outcomes is explained by choices about working hours and 
consumption rather than environmental constraints. Average incomes 
rise by up to 66% if average working hours decline another 11% over 
the next four decades, in line with recent trends, and rise by 75% or 
more if there is no decline in working hours. The remaining income 
differential is accounted for by different assumptions and outcomes on 
resource efficiency, new land markets, agricultural productivity, and 
national and global abatement efforts. 

Net greenhouse emissions show a clear decoupling from the grow- 
ing economy, falling to zero or lower in some scenarios by 2040 (top 
row of Fig. 2). Australian emissions per capita could fall below the 
global average by 2050, from four times the global average today 
(Extended Data Figs 6b and 9f). One-third to one-half of Australia’s 
projected emissions reductions are achieved through biosequestra- 
tion from large areas of new carbon plantings (29-59 Mha in 2050, 
see Extended Data Fig. 5). The remainder is achieved by reducing 
the emissions- and resource-intensity of the economy. If there is a 
strong or very strong abatement effort, domestic emissions could fall 
by up to 33%, even as GDP grows more than 150%; and energy emis- 
sions could fall by up to 29% while energy use grows by 55-120%. 
Similarly, the total mass of fossil fuels, metals, non-metallic minerals 
and biomass” Australia uses is projected to decrease by 36% by 2050 
in scenarios with very strong abatement and improved resource effi- 
ciency (Extended Data Fig. 1h). In other scenarios, total resource use 
is projected to increase by 69%". 
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Figure 2 | Decoupling of emissions, water stress, and native habitat from 
the supply of energy, water and food, respectively, for 18-21 scenarios, 
1970-2050. Each panel shows the scenario trajectories for a key indicator 
of resource use or environmental pressure. The shaded areas indicate 
scenarios in which environmental pressure decreases from current levels 
(in the left-hand panel), with the same scenarios shaded in the right hand 
panel of each row. Models and scenarios are described in Supplementary 
Methods, ‘Overview of modelling framework and scenarios, and information 
on performance of multiple pressures across scenarios is provided in 
Supplementary Methods, ‘Analysis of multiple pressures across scenarios. 
Sources: Supplementary Data worksheets 6a, 2a, 3e, 3a, 5h and 4d. 


National water extractions (by all sectors) are projected to increase 
by up to 101% by 2050. However, up to half (32-56%) of this water 
demand can be met by desalinisation in coastal cities and water recy- 
cling for industrial use. Water stress, indicated by rain-fed water use in 
water-limited catchments'*”!, improves or is stable in 7 of 18 scenarios 
(and is sensitive to governance of new carbon and biodiversity plant- 
ings, as noted below). 

Pressures on biodiversity can also be reduced alongside economic 
growth and increased agricultural activity—resulting in increased 
native habitat and agricultural output volumes (including protein) 
in many scenarios” (bottom row of Fig. 2). Settings that give weight 
to biodiversity restoration could see mixed local native species plant- 
ings make up 36-47% of all carbon plantings in 2050 (against only 
5% under a carbon-focused approach), increasing native habitat by 
up to 25% (37 Mha) in Australia’s intensive use zone, and reversing 
the long-term trend. With strong abatement incentives, we find 
11 Mha of habitat could be restored without large government outlays, 
reducing climate-related extinction risk by 7-9% (assessed for RCP 
4.5 climate)!. 

However, these carbon and biodiversity plantings would reduce sur- 
face water flows, which could exacerbate pressures on river-based eco- 
systems in water-limited catchments (middle row of Fig. 2). Integrated 
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Figure 3 | Comparing living standards and emission outcomes across 
multiple scenarios. Differences in national income (GNI) and net 
greenhouse gas emissions in 2030 and 2050, relative to existing trends. 
Calculations based on 18 scenarios. Emissions, water stress and native habitat 
all improve or are stable in three scenarios, combining step change energy 
efficiency with very strong abatement (L1XI)—marked as (a)—or strong 
abatement (M3XI) (b), or trend energy efficiency with strong abatement 
(M3XR) (c). Differences shown are relative to existing trends (M2XR) 
controlling for working hours and consumption trends. Scenario assumptions 
and notation (such as M2XR) described in the text and Supplementary 
Methods, ‘Calculations for Figure 3 and assessment of potential economic 
performance with different levels of global and national action to reduce 
greenhouse emissions. Extended Data Fig. 6e shows time paths for each 
scenario from 2015 to 2050. Source: Supplementary Data worksheet 6e; see 
Extended Data Figs 1c and 6a. 


governance is needed to properly balance their interceptions with 
competing extractive uses” (Supplementary Methods, ‘Analysis of 
multiple pressures across scenarios’). Existing Australian govern- 
ance arrangements cap extractions from water-limited catchments 
around current levels. The requirement to hold a water licence for 
new plantings embeds the price of water licences in these governance 
arrangements, as discussed below. Alternative governance assumptions 
could further restrain plantings, better safeguarding river health, but 
forgoing up to 0.5 Gt (5%) of cumulative national carbon sequestration 
by 2050. 

Overall, two-thirds of the scenarios assessed (13 of 18) show 
improvement in at least one environmental indicator, but only three 
scenarios (all involving strong or very strong abatement and new 
land markets) show improvement or stable performance in all three 
environmental indicators, reflecting the tensions between reducing 
water stress and restoring terrestrial native habitat, and the impor- 
tance of integrated governance (see Supplementary Methods Fig. 6 
and Supplementary Methods, ‘Analysis of multiple pressures across 
scenarios ). 
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Policies to ease pressures extend established options 

The scenario assumptions that result in reduced environmental pres- 
sures are all continuations of existing trends, combined with greater 
uptake of energy and water efficiency, and a shift towards stronger 
global and national greenhouse gas abatement (Supplementary 
Methods, ‘Overview of modelling framework and scenarios’). Policy 
settings reflect market-based approaches that are already in place in 
Australia or other countries. 

Greenhouse gas abatement is modelled as a uniform global broad- 
based carbon price, representing a variety of potential real-world mixes 
of regulation, standards, grants, taxes, or cap-and-trade arrangements. 
The carbon price in 2015 is US$15 (moderate scenario), US$30 (strong) 
and US$50 (very strong) per tonne of CO emissions, and increases by 
around 4.5% per year in real terms (above inflation) to 2050. This drives 
a 90% reduction in the emissions intensity of Australian electricity from 
2010 to 2050 in the stronger abatement scenarios (eliminating coal- 
fired electricity without carbon capture and storage before 2035 under 
the highest carbon price). Wholesale generation prices are 61-106% 
higher in 2050, and household electricity prices are 11-12% higher 
(strong) or 32% higher (very strong), compared to the no-abatement 
scenarios. However, affordability changes very little, owing to higher 
household incomes (in all scenarios) and higher energy efficiency in 
scenarios with higher prices’”. 

Payments to Australian landholders for biosequestration are 15% 
below the global carbon price, with the forgone carbon revenue applied 
to increasing the share of native habitat plantings from 4-5% to 36-46% 
of total area in 2050. The resulting biodiversity ‘top up payments’ 
account for 22-30% of payments to habitat plantings in these scenar- 
ios over the decade to 2050, complementing carbon income. (These 
payments should be interpreted as a one-off payment for implement- 
ing a conservation covenant, for the area of new habitat added in that 
period.) 

On water, we find that interceptions from new plantings result in 
increased water stress in many of the very strong abatement scenarios 
(which have the highest levels of new plantings). We find the profit- 
ability of carbon plantings is not sensitive to water licence prices: a 
doubling results in just a 4% reduction in the area of new plantings 
in water-limited catchments. Limiting the area of plantings to avoid 
this increased water stress would require a 200% increase in the water 
licence price (increasing the asset value of licences to existing owners). 


Policy choices are crucial, not changes in values 

These results provide insights into the contested relationship between 
economic growth and environmental sustainability”*, complement- 
ing historical analyses'®>-*” (Supplementary Methods, ‘Competing 
views on the prospects for sustainability’). A ‘technological optimist’ 
view considers market-driven technological advances will ensure that 
growth does not transgress key environmental thresholds***°. Others 
suggest that institutional reform and new policies could achieve 
necessary changes within established values and paradigms”**!"*3, 
noting that environmental damage may occur during the long lags 
between problem identification and policy responses!®?534-76, A 
third ‘communitarian limits’ view argues that sustainability will require 
a fundamental shift in societal values, often involving a rejection of 
economic growth?”38, or a shift from consumerism to a values-based 
commitment to living within ecological limits*” 

We find that decoupling economic growth from environmental pres- 
sure before 2050 would not require a change in societal values, but is 
not automatic—contrary to both the communitarian limits and tech- 
nological optimist positions. It is not projected to occur under existing 
trends, and requires, in our scenarios, collective choices to increase 
global and national abatement efforts. 

The analysis explores potential behavioural change in several ways. 
The modelling simulates bottom-up individual choices on working 
hours and consumption that shape production and consumption 
as incomes rise (income elasticity) and relative prices change (price 
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elasticity). These choices interact with different assumptions about 
policy settings (reflecting collective choices), such as incentives for 
greenhouse gas abatement, and about bottom-up trends, such as the 
uptake of energy and water efficiency. None of the scenarios assume a 
new social or environmental ethic. In particular, increasing Australia’s 
abatement effort in line with emissions reductions by other countries 
would be consistent with Australian public opinion*® and assessments 
of Australia’s national interest*!~ in limiting the rise in average global 
temperature to 2°C*”?4, and so is not interpreted as implying a 
change in values. Rather, the analysis reflects how goal-oriented human 
behaviour can change with circumstances (including new information, 
or changes in the actions of others), without requiring any change in 
underlying goals and values. 

We find collective policy choices are crucial, explaining 46-94% of 
differences in environmental performance and resource use across 
the scenarios examined (see Extended Data Fig. 7 and Supplementary 
Methods, ‘Assessing the contributions of individual and collective 
choices’). Consistent with the institutional reform approach?>?*“>"°, we 
find top-down collective choices are particularly important in shaping 
‘public good’ outcomes—accounting for 83-94% of the differential in 
scenario outcomes for net greenhouse gas emissions, and 69-89% for 
greenhouse emissions excluding land sector sequestration. Bottom-up 
individual choices play a greater role when private and public benefits 
are aligned, such as when improved resource efficiency delivers finan- 
cial savings. Individual choices account for up to half of the differential 
in scenario outcomes for energy use (33-47%) and non-agricultural 
water consumption (16-53%). 


Giving value to natural assets can build new advantage 
Economic analysis of climate change mitigation typically finds that lim- 
iting emissions involves near-term costs, but can yield net benefits over 
the long term (well after 2050) through avoided climate impacts°?"4!4, 
Near-term co-benefits such as improved air quality and human health 
are also identified*”**. However, our analysis identifies additional 
near-term economic benefits for nations with a comparative advan- 
tage in ecosystem services, particularly carbon sequestration from 
reforestation. For these nations, stronger action to improve resource 
efficiency and environmental performance could unlock new sources 
of economic opportunity and growth, boosting near-term income while 
protecting natural assets essential to long-term well-being. 

Figure 3 compares national income and net emissions outcomes in 
2030 and 2050 for 18 scenarios. All seven stronger abatement scenarios 
(blue and purple) with land sector markets have better economic perfor- 
mance to 2050 than those with moderate abatement (green scenarios). 
National income (GNI) in 2050 in these scenarios is up to 6% higher 
than under existing trends (see quadrant 1). These win-win outcomes 
occur because carbon sequestration becomes more profitable than beef 
and other agricultural production across large areas of Australia (up to 
58 Mha, or 70% of the intensive-use zone), in a world taking stronger 
action to reduce emissions. Stronger abatement incentives also promote 
electrification and the use of biofuels in road transport, reducing oil 
imports. These economic gains outweigh the costs of more stringent 
national emissions targets, as well as the impacts of lower global demand 
for (and value added from) Australia’s emissions-intensive exports, rel- 
ative to moderate national and global abatement (see Supplementary 
Methods, ‘Calculations for Fig. 3 and assessment of potential economic 
performance with different levels of global and national action to reduce 
greenhouse emissions’ and Extended Data Fig. li). 

Across the scenarios explored, we find land-sector markets are 
needed to exploit these shifts in comparative advantage. Quadrant 4 
reflects missed opportunities, including the scenario where very strong 
abatement action without land-sector markets leads to the worst rel- 
ative economic performance (solid purple circle). Other scenarios in 
this quadrant involve transitions: pathways where emissions reductions 
generate net costs around 2030, but net benefits by 2050, relative to 
existing trends (see Extended Data Fig. 6e for time paths). 
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Quadrant 2 shows the scenarios in which there is no global or 
national action to reduce emissions, reflecting a decline from current 
modest abatement efforts. Here, national income in 2050 is projected to 
be 5-7% higher than for existing trends, while emissions are projected 
to be 35-51% higher. These scenarios illustrate the classic ‘unsustain- 
able development’ trade-off, where higher near-term living standards 
are achieved at the cost of increased risks and future damage to the 
Earth’s natural capital and life-support systems**°. Adverse environ- 
mental feedbacks might see these scenarios shift towards quadrant 3 
after 2050, combining worse economic performance and higher emis- 
sions. Limitations of the current modelling framework suggest that the 
analysis is likely to overstate the relative economic performance of the 
no-action scenarios (orange) and understate that of the very strong 
abatement scenarios (purple), because it does not fully account for all 
potentially significant climate impacts’. 


Making progress towards sustainable prosperity 

In summary, we find that Australia could materially ease environ- 
mental pressures while enjoying strong economic growth. Many of 
the 20 scenarios we explored would represent substantial progress 
towards sustainable prosperity*®. Australia could begin to repair past 
damage; restoring significant areas of native habitat and achieving 
negative emissions (net sequestration) of greenhouse gasses. But 
none of these scenarios would guarantee sustainability, or eliminate 
future threats to Australia’s natural capital and the Earth’s life-sup- 
port systems®**. Instead, each implies a different portfolio of risks 
and opportunities, which we have not fully modelled beyond 2050. 
For example, new native habitat established before 2050 could pro- 
vide a permanent flow of biodiversity benefits and other ecosystem 
services, while the flow of carbon sequestration provided will peak 
and eventually decline to zero, drawing attention to challenges and 
opportunities beyond our modelling horizon, such as the possibility 
of using carbon plantations to generate negative emission bioenergy 
with carbon capture and storage”’. 

Reducing environmental pressures will not require a shift in societal 
values, but neither will technology deliver it automatically. Collective 
choices and public policy settings have a crucial contribution, and 
well-designed markets can boost national income by exploiting new 
areas of comparative advantage in some circumstances. However, these 
scenarios may present new longer-term risks and opportunities, and 
the synergies and trade-offs involved will be influenced by global cir- 
cumstances. We also find an important threshold effect: moderate 
global action to reduce greenhouse emissions may diminish Australia’s 
traditional comparative advantage (particularly in fossil fuel-based 
sectors) without creating new areas of advantage; while stronger 
global action that places tangible value on emissions reductions could 
create new opportunities for creating value, providing win-win eco- 
nomic and environmental benefits relative to existing trends. While 
Australia could dramatically reduce environmental pressures across 
a wide range of global contexts, the economic costs of doing so will be 
smaller (and benefits larger) in global settings that support the stable 
functioning of key Earth systems, including through promoting clean 
energy. As these global circumstances emerge, Australia’s opportuni- 
ties will multiply. 

Sustainable prosperity is possible, but not predestined. Australia is 
free to choose. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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Extended Data Figure 1 | Australian economic activity, income and 
living standards, and material and energy intensive industries to 2050. 
Projections for 20 scenarios for nine indicators, and touchstone scenarios 
for one indicator. Income, consumption, and average working hours provide 


indicators of living standards. PES refers to payments for ecosystem services 
(carbon sequestration and habitat restoration). Definitions of scenarios and 
scenario assumptions, details of scenario sets, a full list of indicators, and 
references for historical data are provided in the Supplementary Information. 
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2d Australian Electricity Supply by Source, touchstone scenarios, 2010-2050 
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2e Australian Transport Energy by Fuel Type, touchstone scenarios, 2010-2050 
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Extended Data Figure 2 | Australian energy use to 2050. Projections for 18 or 20 scenarios for three indicators, and touchstone scenarios for two indicators. 
Definitions of scenarios and scenario assumptions, details of scenario sets, a full list of indicators, and references for historical data are provided in the 
Supplementary Information. CCS, carbon capture and storage. 
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3a Water use, total (including interceptions), all catchments, 
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3c Agricultural extractive water use, all catchments, 
18 scenarios, 1994-2050 
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Extended Data Figure 3 | Australia water use to 2050. Projections for 
20 scenarios for two indicators and 18 scenarios for six indicators. Total 
water use is made up of extractive use plus interceptions of surface flows 
by new plantings that would otherwise contribute to streamflow. Water 
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3d Water interceptions from land use change, all catchments, 
18 scenarios, 2013-2050 
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18 scenarios, 2000-2050 
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3h Water interceptions from land use change, water limited catchments, 
18 scenarios, 2013-2050 
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use in water-limited catchments provides an indication of water stress. 
Definitions of scenarios and scenario assumptions, details of scenario sets, 
a full list of indicators, and references for historical data are provided in the 
Supplementary Information. 
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Extended Data Figure 4| Australian agriculture output values, volumes 
and land use to 2050. Projections for 21 scenarios for 12 indicators. Food 
grains are a sub-set of crops. Protein calculation based on agricultural output 
volumes for all food commodities (including cereals, beef, sheep, legumes 
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and dairy milk), weighted using USDA (2014). Definitions of scenarios 
and scenario assumptions, details of scenario sets, a full list of 
indicators, and references for historical data are provided in the 
Supplementary Information. 
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Extended Data Figure 5 | Australian land sector output values, volumes 
and land use to 2050. Projections for 21 scenarios for eight indicators. Total 
land sector activity is made up of agriculture (detailed in Extended Data 
Fig. 4) and payments for ecosystem services (carbon sequestration and 


habitat restoration) (see Extended Data Fig. 1i, j). Definitions of scenarios 
and scenario assumptions, details of scenario sets, a full list of indicators, 
and references for historical data are provided in the Supplementary 
Information. 
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Extended Data Figure 6 | Australian greenhouse gas emissions and Fig. 6e are set out in Supplementary Methods, ‘Calculations for Fig. 3 and 
abatement to 2050. Projections for 18 scenarios for four indicators, and assessment of potential economic performance with different levels of global 
touchstone scenarios for one indicator. Domestic net emissions are defined and national action to reduce greenhouse emissions. Definitions of scenarios 
as direct emissions less carbon sequestration (CCS and biosequestration) and scenario assumptions, details of scenario sets, a full list of indicators, and 


before trade in international emissions units. Calculations for Extended Data _ references for historical data are provided in the Supplementary Information. 
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Extended Data Figure 7 | Maximum and minimum contributions of 
individual and collective choices to differences in projected greenhouse 
gas emissions, energy use, and non-agricultural water use in 2050. 
Calculations based on 20 scenarios, as described in Supplementary Methods, 
‘Assessing the contributions of individual and collective choices, drawing on 
data from Extended Data Figs 6a, b, 2a and 3b, c. Scenario assumptions and 
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characteristics of the modelling framework prevent meaningful analysis of 
other indicators of environmental pressure for this purpose, such as total 
water use including agricultural extractions. Definitions of scenarios and 
scenario assumptions, details of scenario sets, a full list of indicators, 

and references for historical data are provided in the Supplementary 
Information. 
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Extended Data Figure 8 | Australian population, 1970-2050. Population 
trajectory assumed in all domestic National Outlook scenarios. Information 
on age structure and dependency ratios is provided in ref 1. Definitions 

of scenarios and scenario assumptions, details of scenario sets, a full 

list of indicators, and references for historical data are provided in the 
Supplementary Information. 
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Extended Data Figure 9 | World population, economic activity, energy, 
emissions and agriculture to 2050. Projections for four global context 
scenarios for 11 indicators, and for three global context scenarios for two 
indicators. The global scenarios assume different combinations of population 
and cumulative greenhouse gas emissions, implying different levels of global 
abatement effort as well as different patterns of global demand and supply of 


(medium population, moderate abatement) global scenario also assumes 
higher global agricultural productivity, resulting in lower agricultural prices 
than would be projected otherwise. Definitions of scenarios and scenario 
assumptions, details of scenario sets, a full list of indicators, and references for 
historical data are provided in the Supplementary Information. 
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Combinatorial gene regulation by 
modulation of relative pulse timing 


Yihan Lin'*, Chang Ho Sohn’, Chiraj K. Dalal!?+, Long Cai? & Michael B. Elowitz'* 


Studies of individual living cells have revealed that many transcription factors activate in dynamic, and often stochastic, 
pulses within the same cell. However, it has remained unclear whether cells might exploit the dynamic interaction of 
these pulses to control gene expression. Here, using quantitative single-cell time-lapse imaging of Saccharomyces 
cerevisiae, we show that the pulsatile transcription factors Msn2 and Mig] combinatorially regulate their target 
genes through modulation of their relative pulse timing. The activator Msn2 and repressor Mig] showed pulsed 
activation in either a temporally overlapping or non-overlapping manner during their transient response to different 
inputs, with only the non-overlapping dynamics efficiently activating target gene expression. Similarly, under 
constant environmental conditions, where Msn2 and Mig! exhibit sporadic pulsing, glucose concentration modulated 
the temporal overlap between pulses of the two factors. Together, these results reveal a time-based mode of combi- 
natorial gene regulation. Regulation through relative signal timing is common in engineering and neurobiology, and 
these results suggest that it could also function broadly within the signalling and regulatory systems of the cell. 


In order to respond to environmental conditions, cells make exten- 
sive use of combinatorial gene regulation, in which two or more 
transcription factors co-regulate common target genes. Most analysis 
of combinatorial regulation presumes that the concentrations of 
transcription factors in the nucleus are regulated in a continuous 
(non-pulsatile) manner'*. However, recent work has identified a large 
and growing list of transcription factors that activate in pulses*”"’. In 
such systems, a single pulse begins when many molecules of a given 
transcription factor are activated simultaneously, and ends when 
they are deactivated. Such pulses can occur repetitively, even 
under constant conditions. Pulsatile regulation has been observed 
in bacteria®’*"’, yeast*'°’*"”, and mammalian stress response and 
signalling pathways®”'''*”*. In these systems, inputs typically modu- 
late the pulse frequency, amplitude, and/or duration of individual 
transcription factors to regulate genes. However, despite analysis of 
many individual pulsatile transcription factors, the interactions 
between multiple pulsatile systems in the same cell have not yet been 
explored and analysed. 

Saccharomyces cerevisiae provides an ideal model system to analyse 
such dynamic transcription factor interactions. It contains several 
well-characterized pulsatile systems that control core cellular func- 
tions. In particular, the general stress response transcription factor 
Msn2, and its paralogue, Msn4, activate hundreds of target genes in 
response to diverse stresses including ethanol, heat, oxidative stress, 
salt, and glucose starvation***°. Similarly, the repressor Mig], along 
with its paralogue, Mig2, control many target genes, especially those 
involved in metabolism, in response to changes in glucose concentra- 
tion*’*’. Together, Msn2 and Migl co-regulate over 300 target 
genes (according to Yeastract**). Both Msn2 and Mig] are activated 
by dephosphorylation, which leads to nuclear localization**’. 
Previous work has shown that Msn2 nuclear localization can occur 
in a pulsatile fashion in response to various inputs®*'®'*’”**, Mig] is 
known to quickly localize to the nucleus in response to an increase in 
glucose levels**, and can also exhibit pulsatile activation®. 


Two stages of dynamic pulsing 


To analyse Msn2 and Mig] dynamics in the same cell, we constructed 
strains expressing fusions of Msn2 and Mig] proteins to the distin- 
guishable fluorescent proteins*” mKO2 and mCherry, respectively 
(Fig. la). To simplify the analysis, we knocked out their paralogues 
Msn4 and Mig2 (Methods). We attached single cells to the glass 
surface of a microfluidic channel, maintaining a constant flow of 
media, while acquiring time-lapse movies. By analysing individual 
cells in these movies, we could track the nuclear localization dynamics 
of both proteins over time (Methods). 

We first analysed the effects of glucose reduction, which is known 
to induce changes in nuclear localization for both transcription 
factors**°*. In response to a sudden step from 0.2% to 0.1% glucose, 
both proteins exhibited pulses of nuclear localization, but did so with 
different timing (Fig. 1b). Msn2 localized to the nucleus immediately, 
while Mig] exited the nucleus. Subsequently, in many cells (75%), 
Msn2 exited the nucleus followed by the re-entry of Mig1 (Fig. 1b; 
Supplementary Video 1). This transient response terminated within 
~30 min (Fig. 1b, bottom). We describe events like this, in which 
Msn2 and Mig] pulses are temporally separated, as non-overlapping 
(see Fig. 1b, top and Methods). After this event, Msn2 and Mig] 
exhibited sporadic pulsing that was unsynchronized between cells 
(Supplementary Video 1). During this steady-state period, we 
observed both overlapping (that is, coincident) events, in which 
Msn2 and Mig] pulses overlap, as well as non-overlapping events 
in which Msn2, but not Mig] localized to the nucleus (Fig. 1b, top 
and Methods). 

These data provoke two interrelated questions about whether and 
how relative pulse timing could function in combinatorial regulation 
(Fig. 1c): first, do inputs modulate the relative timing of transcription 
factor pulses, either during the transient response to a change in 
conditions, or during the subsequent period of repetitive pulsing? 
Second, if so, how does such pulse timing modulation affect down- 
stream combinatorial gene regulation? 


1Howard Hughes Medical Institute, California Institute of Technology, Pasadena, California 91125, USA. “Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, 
California 91125, USA. 3Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA. +Present address: Department of Microbiology and 


Immunology, UCSF, San Francisco, California 94143, USA. 
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Figure 1 | Temporally structured pulsing of transcription factors Msn2 
and Mig] in response to glucose reduction. a, Inputs such as glucose regulate 
the phosphorylation and nuclear localization of Msn2 and Mig1, which 
co-regulate some common target genes. Three-colour strains allow simul- 
taneous analysis of Msn2 and Mig] nuclear localization dynamics and target 
gene expression. Yeast strains contained Msn2 (green) and Mig] (red) 
fluorescent protein fusions, along with a target promoter with (shown) or 
without (not shown) binding sites for Msn2 and Mig], driving expression of a 
transcript containing 24 stem-loops that are specifically bound by the PP7 
RNA binding protein fused to 2 < GFP (blue circles). b, An example single-cell 
trace showing nuclear localization dynamics of Msn2 and Mig1. The cell 
exhibits an immediate temporally structured response to the step in glucose 
(arrowhead and dashed line), as well as sporadic pulsing throughout the movie. 
Filmstrips show examples of non-overlapping and overlapping events. White 
dashed circles indicate cell boundaries and numbers indicate time points. 
Scale bar is 2 jum. Lower plot shows average trace, revealing the synchronized 
transient non-overlapping response followed by a constant average response 
due to unsynchronized pulsing. Shading indicates 95% confidence intervals of 
the mean (Methods). c, These dynamics provoke the questions of how inputs 
modulate relative timing of Msn2 and Mig] pulses, and how that timing 
affects gene regulation. 


To address these questions, we constructed strains containing syn- 
thetic target promoters incorporating binding sites for either or both 
transcription factors (Fig. la). These promoters drove expression of a 
transcriptional reporter consisting of 24 binding sites for a separately 
expressed PP7 RNA binding protein fused to green fluorescent pro- 
tein (GFP)* (Fig. 1a). These strains enabled us to simultaneously 
follow localization dynamics of Msn2 and Mig] and downstream 
target expression in the same cell. 


Relative pulse timing in the transient response 

We first analysed transient responses to changes in various input con- 
ditions (that is, different Msn2 stressors) other than the known com- 
mon input glucose (Fig. 2a). Addition of 100mM NaCl produced 
transient non-overlapping pulses of Msn2 and Migl in single cells 
and in population averages (Fig. 2b, Extended Data Fig. la-c, 
Supplementary Video 2) that were similar to those observed in the 
transient response to glucose reduction (Fig. 1b). Addition of 2.5% 
ethanol also activated both transcription factors. But in contrast to 
NaCl, it did so with overlapping, rather than non-overlapping, pulses 
(Fig. 2c, Extended Data Fig. 1d—f, Supplementary Video 3). The differ- 
ence in relative timing between NaCl and ethanol was also apparent in 
cross-correlation analysis (Extended Data Fig. 1g). Together, these 
results indicate that distinct inputs can generate opposite relative timing 
in the transient responses of Msn2 and Mig]. 

We hypothesized that control of temporal overlap could provide a 
mechanism for combinatorial gene regulation. Non-overlapping 
pulse dynamics, in which the activator Msn2 is active, but the repres- 
sor Migl is not, could activate combinatorial target genes more 
efficiently than overlapping pulses, in which the two proteins are 
simultaneously bound to the same target promoter. Indeed, while 
both NaCl and ethanol led to activation of an Msn2-specific target 
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Figure 2 | Different inputs produce distinct transient gene expression 
responses by modulating relative pulse timing. a, Transient nuclear 
localization and gene expression responses were simultaneously monitored in 
individual cells. b, c, Addition of NaCl (100 mM) or ethanol (2.5%) induced 
non-overlapping and overlapping responses, respectively. Green and red traces 
show mean Msn2 and Mig] nuclear localization, respectively. d, e, Averaged 
single-cell transcriptional activity traces show that NaCl activated both 
combinatorial and Msn2-specific targets, while ethanol activated only the 
Msn2-specific target. Shading in b-e indicates 95% confidence interval of the 
mean. f, qPCR data are consistent with single-cell data (b, c), and extend these 
responses to heat shock and H,O, stresses (Extended Data Fig. 1h, i; see 
Methods). Error bars indicate s.e.m. calculated from 3-8 biological replicates. 


promoter, only the non-overlapping dynamics of NaCl efficiently 
induced target expression (Fig. 2d, e, Extended Data Fig. la-f). 
Moreover, we observed similar timing-mediated regulation with 
other stresses. Heat shock and oxidative stress (from H2O,) induced 
non-overlapping and overlapping dynamics, respectively (Extended 
Data Fig. 1h, i). As with the other stresses, both non-overlapping and 
overlapping dynamics activated an Msn2-specific target promoter, 
but only non-overlapping dynamics efficiently activated the combin- 
atorial target promoter (Fig. 2f). As expected, the dependence of 
expression from the synthetic combinatorial target promoter on rela- 
tive timing required both Msn2 and Mig] (Extended Data Fig. 2a). In 
addition, these effects were not specific to the synthetic target pro- 
moter, as expression of GSY1 (ref. 41), an endogenous target of Msn2 
and Mig], exhibited similar dependence on relative timing in res- 
ponse to stresses, as shown by both single-cell analysis and quantitat- 
ive PCR data (Extended Data Fig. 2b-e). In fact, further genome-wide 
analysis revealed 30 additional endogenous targets that exhibited a 
similar pattern of gene regulation during transient responses to NaCl 
and ethanol (Methods, Extended Data Fig. 2f-k, and Supplementary 
Discussion), suggesting that relative timing-dependent regulation 
applies to multiple endogenous target genes, as well as to the synthetic 
promoter. Together these data indicate that, during transient stress 
responses, cells regulate gene expression by modulating the relative 
pulse timing between Msn2 and Mig]. 


Regulation by relative pulse timing at steady-state 


We next asked whether relative pulse timing could also function in 
constant environmental conditions where both transcription factors 
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pulse sporadically and repetitively. Because such pulsing is not syn- 
chronized among cells, it could only be analysed with single-cell 
movie data. We observed both overlapping and non-overlapping 
pulse events under constant conditions (Fig. 1b, Fig. 3a, Extended 
Data Fig. 3a, b, and Supplementary Videos 4, 5). To better understand 
the effects of each type of event on gene expression, we adapted the 
technique of pulse-triggered averaging from neurobiology (usually 
called spike-triggered averaging)” (Extended Data Fig. 3c). We iden- 
tified Msn2 pulses, and sorted them into two groups depending on 
whether or not a Mig] pulse overlapped temporally with the Msn2 
pulse (Fig. 3a, Methods). We then averaged the Msn2 and Mig] 
dynamics over a time window around the Msn2 pulse peaks, for both 
overlapping and non-overlapping events. By construction, the result- 
ing pulse-triggered averages showed opposite overall dynamic rela- 
tionships between the two proteins (Fig. 3b, c). 

Pulse-triggered averaging enabled us to analyse the dependence of 
target gene expression on Msn2 pulsing and, more specifically, on its 
temporal relationship with Migl, averaged over variability in both 
pulsing behaviour and downstream transcriptional responses (see 
Supplementary Discussion about the multiple layers of variability in 
this system). Both overlapping and non-overlapping pulses led to 
subsequent increase in the mean expression of the pure Msn2 syn- 
thetic target promoter (Extended Data Fig. 4a—c). However, only the 
non-overlapping events showed activation of the synthetic combin- 
atorial Msn2-Mig1 promoter or the natural combinatorial target gene, 
GSY1 (Fig. 3d, e). Moreover, deletions of the zinc-finger DNA binding 
domains of either Msn2 or Mig] eliminated the relative timing- 
dependence of GSY1 expression, indicating that DNA-binding of 
both proteins is necessary for relative timing-dependent regulation 
(Extended Data Fig. 4d). Together, these results show that relative 
timing between Msn2 and Mig] pulses regulates gene expression 
under steady-state conditions. 

Thus far, we have simplified the analysis of relative pulse timing by 
classifying events as either overlapping or non-overlapping. However, 
cross-correlation analysis revealed more complexity in the dynamics. 
For example, we observed a peak at a positive time lag of ~2-4 min, 
corresponding to sequential activation of Msn2 followed by Mig] 
(Extended Data Fig. 4f-i, also evident in Fig. 3c, f; see Supplementary 
Discussion). More generally, the data showed a continuous distribution 
of time intervals between a given Msn2 pulse and its previous, or 
subsequent, Mig] pulse. To better understand how these dynamics 
affect target gene expression, we analysed the dependence of mean 
expression level on the continuous time interval between Msn2 and 
Mig] pulses (Extended Data Fig. 5a, b). Mean gene expression is min- 
imal when Msn2 and Mig] pulse simultaneously, but Mig] pulses 
occurring within ~ 4-5 min before or after Msn2 pulses also suppress 
mean expression. These results are consistent with a model in which 
Mig] pulses can both terminate continuing expression from preceding 
Msn2 pulses, and also establish promoter states with reduced tendency 
to activate in response to Msn2, possibly due to residual binding of 
Mig] itself or to Mig1-induced effects on promoter states. As expected, 
these extended timing effects required both Msn2 and Mig] binding 
sites on the target promoter, as well as DNA-binding activities of both 
proteins (Extended Data Fig. 5d, e). These characteristic timescales for 
Msn2-Mig1 pulse interactions establish the degree of simultaneity 
necessary for pulses to function as overlapping events. 


Modulation of relative pulse timing 

Having established the effect of relative pulse timing on gene expression 
at steady-state, we next asked whether and how inputs affect relative 
timing. We acquired time-lapse movies of Msn2 and Mig] nuclear local- 
ization across a range of constant glucose concentrations (from 0.4% to 
0.0125%), where both Msn2 and Mig] exhibited sporadic nuclear local- 
ization pulses (Extended Data Fig. 6, 7b-e). The frequencies of pulses for 
both proteins, and the mean duration of Mig1 pulses, all varied system- 
atically with glucose concentration (Extended Data Fig. 7a), while mean 
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Figure 3 | Pulse-triggered averaging reveals relative pulse timing- 
dependent gene expression under constant conditions, and modulation 

of relative timing by glucose concentration. a, Localization and target 
transcription dynamics in a single cell under constant (0.05%) glucose. Msn2 
and Mig] localization are shown in green and red, respectively, while 
transcriptional activity of their co-regulated target, GSY1 (GSY1-24xPP7SL) 
is shown in blue. Filmstrips show examples of non-overlapping and 
overlapping events (indicated by grey shading). White arrows on the upper 
filmstrip indicate active transcriptional sites for the target gene. Green and red 
horizontal lines below plot indicate identified Msn2 and Mig] pulses. Green 
arrows indicate peaks of the Msn2 pulses used for pulse-triggered averaging 
(Methods). b, c, Pulse-triggered averages of Msn2 and Mig] localization 
events sorted into non-overlapping (b, purple; n = 14,384 events) and 
overlapping (c, orange; n = 7,829 events) groups. d, e, Pulse-triggered average 
transcriptional activity traces for non-overlapping (d) and overlapping (e) 
events. Baseline activity (horizontal dashed line) was subtracted from 

each trace. Traces are aligned to the peak Msn2 pulse at t = 0 (vertical dashed 
line). f, Cross-correlation between Msn2 and Mig] dynamics at different 
glucose levels (see also Extended Data Fig. 7g). g, Glucose levels modulate 
the percentage of Msn2 pulses that overlap with Mig1. Circles indicate 
measurements of pulse frequency (location of circle) and the percentage of 
Msn2 pulses that overlap with Mig1 (overlap fraction, colour of circle) for 
nine glucose levels (from 0.4% to 0.0125% as in Extended Data Fig. 8a). 
Horizontal contours indicate the overlap fraction expected at each glucose 
level assuming independent Msn2 and Mig] dynamics (Methods). See also 
controls in Extended Data Fig. 8f, g. Shading indicates 95% confidence 
intervals of the mean. 


pulse amplitudes remained approximately constant (Extended Data Fig. 
7a). Interestingly, however, averaged cross-correlations between Msn2 
and Mig] nuclear localization traces showed features (for example, the 
peak at time lag zero) that depended on glucose concentration (Fig. 3f). 
Furthermore, the percentage of Msn2 pulses that overlap with Migl, 
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which we define as the overlap fraction, changed systematically with 
glucose concentration (Fig. 3g and Extended Data Fig. 8a). Together, 
these results indicate that glucose concentration modulates the relative 
pulse timing between Msn2 and Mig] at steady-state conditions. 

To better understand the effect of glucose concentration on relative 
pulse timing, it is helpful to distinguish between passive and active 
types of modulation. Passive modulation arises from changes in 
the frequency and/or duration of Mig] pulses, and occurs even if 
Msn2 and Mig] dynamics are independent. By contrast, active modu- 
lation would require mechanisms that specifically enhance or reduce 
the fraction of overlapping events. 

Passive modulation seems to dominate at lower glucose concentra- 
tion, but both passive and active modulation occur at higher glucose 
concentrations. At very low glucose levels (<0.05%), the observed 
overlap fraction agreed with expectations based on passive modu- 
lation only (Methods, lower right of Fig. 3g and Extended Data 
Fig. 8a). However, at higher glucose levels (20.05%), where pulse 
frequencies became less glucose-dependent (Extended Data Fig. 7a), 
the observed overlap fraction exceeded the value expected from 
passive modulation, and increased systematically with glucose con- 
centration (upper left corner of Fig. 3g and Extended Data Fig. 8a), 
indicating a substantial role for active modulation. Moreover, includ- 
ing the active component of modulation improved the ability of 
a simple model to explain the dependence of target gene expression 
on glucose (Extended Data Fig. 8b-d and Supplementary Discussion). 
We also found that relative pulse timing could be further modulated 
by other inputs such as NaCl and ethanol (Extended Data Fig. 9 and 
Supplementary Discussion). These results show that, under steady- 
state conditions, input identity (type of stress) and level (for example, 
glucose concentration) together modulate relative pulse timing, 
through both passive and active mechanisms, to control target 
gene expression. 


Mechanism for relative pulse timing modulation 


Relative pulse timing modulation represents a distinct mode of gene 
regulation that operates in both steady-state and transient conditions 
(Fig. 4a, see also Supplementary Discussion). What mechanisms 
could enable cells to actively control relative pulse timing? One pos- 
sibility involves regulatory components that specifically generate 
overlapping pulses of Msn2 and Mig1. Previous work has shown that 
Glc7, the catalytic component of PP1 phosphatase, can indirectly 
regulate both Msn2 and Mig] nuclear localization*’, making it a can- 
didate for an active regulator of overlapping pulses (Extended Data 
Fig. 10a). We constructed a strain in which the wild-type GLC7 pro- 
moter was replaced with a Cu’*-inducible promoter in the native 
locus. In this strain, reducing expression of GLC7 below wild-type 
levels abolished active modulation, making the measured overlap 
fraction equal to that expected by chance (overlap of red solid and 
dashed lines in the left panel of Fig. 4b). This effect can also be seen in 
the Msn2-Mig] cross-correlation at time lag zero, which is reduced at 
higher glucose concentrations (compare red and black lines in Fig. 4b, 
right). Restoring GLC7 expression close to wild-type levels restored 
active modulation (blue lines, Fig. 4b). Together, these data (Fig. 4b 
and Extended Data Fig. 10) support a role for Glc7 in active modu- 
lation by glucose (Supplementary Discussion). Other phospho- 
regulatory components may also contribute to active modulation in 
these and other conditions. 


Discussion 


What functions could relative pulse timing modulation provide for 
the cell? One of the most fundamental concepts in combinatorial 
regulation is that cooperative interactions between transcription fac- 
tors can increase their probability of simultaneous binding to a pro- 
moter, to implement cis-regulatory logic™. By controlling the fraction 
of time that two transcription factors are simultaneously active, rela- 
tive pulse timing modulation could provide similar effects in trans 
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Figure 4 | Mechanistic aspect of relative pulse timing modulation. a, In 
gene regulation by relative pulse timing modulation (schematic), the 

identity and level of inputs (yellow and brown circles) regulate target gene 
expression through changes in the relative timing of Msn2 and Mig] pulses 
(see Supplementary Discussion). Overlapping events (orange) only activate 
Msn2-specific targets while non-overlapping events (purple) activate both 
Msn2-specific and Msn2-Mig1 combinatorial targets. In steady-state (right), 
inputs modulate the fraction of Msn2 pulses that overlap with Mig] (pie 
charts). b, GLC7 mediates active modulation of relative pulse timing, possibly 
by activating both Msn2 and Mig] (schematic inset, left). Left, measured 
(solid line) and expected (dashed line) overlap fractions were plotted for three 
conditions: wild-type (black), reduced GLC7 expression (red), and the same 
strain with GLC7 expression restored to approximately wild-type levels (blue). 
Right, average cross-correlation between Msn2 and Mig] dynamics for 

three glucose levels (percentages). Shading and error bars indicate 95% 
confidence intervals of the mean. 


(Supplementary Note and Extended Data Fig. 10f-h). In addition to 
its functionality, a number of basic issues about timing-dependent 
regulation remain to be understood. For example, what accounts 
for variability among cells in their transcription factor dynamics 
and the apparently stochastic response of target promoters to those 
dynamics? What features of target promoters, such as the kinetic 
parameters that govern their activation, determine whether and 
how they respond to timing-based regulation? 

Relative timing between signals plays many important roles 
throughout science and engineering. In neuroscience, the relative tim- 
ing of action potentials at pre- and post-synaptic neurons controls the 
strength of synaptic connectivity through spike-timing-dependent 
plasticity*. In communications, modulating the phase of a periodic 
signal relative to a reference signal is widely used to encode informa- 
tion*®. Cells seem to have evolved a related strategy by encoding aspects 
of the extracellular environment in the relative timing with which 
different transcription factors pulse. The unsynchronized nature of 
these pulses (at steady-state) has made relative pulse timing modu- 
lation rather difficult to detect and characterize previously. However, 
pulsatile dynamics (both periodic and aperiodic) are now being 


5 NOVEMBER 2015 | VOL 527 | NATURE | 57 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


discovered in a growing list of central signalling and regulatory path- 
ways” ~>, which are known to interact, or crosstalk, with one another. It 
will therefore be critical to more systematically map the temporal 
organization of cellular pathways, and determine principles that can 
explain both the mechanisms and functions of relative pulse timing 
modulation in living cells. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Strain construction. Standard protocols were used for molecular cloning. 
Plasmids were replicated in either TOP10 or DH5a Escherichia coli. Except where 
indicated, all yeast strains used in this study were constructed based on BY4741 
(MATa his3A0 leu2A0 met15A0 ura3A0), where msn4, mig2, nrgl, nrg2 were 
further deleted (seamless deletion) or compromised (with auxotrophic or drug 
markers) to avoid complications resulting from these proteins binding to Msn2 or 
Mig1 binding sites. All yeast transformations were performed with standard 
lithium-acetate protocol’” or with Frozen-EZ Yeast Transformation II Kit 
(Zymo Research). Resulting constructs were confirmed with PCR and/or sequen- 
cing. Details of strain genotypes are listed in Supplementary Table 1. 

For endogenous gene fusion, MSN2-mKO2::LEU2 and MIG1-mCherry::spHIS5 
were constructed by the fusion PCR approach where a PCR product comprised of 
300-500 bp of 3’ end of target of interest, mKO2 or mCherry gene, LEU2 or 
spHIS5 cassette, and another 300-500 bp of the target downstream. More specif- 
ically, mCherry::spHIS5 was directly PCR amplified from pKT355 plasmid, 
mKO2 gene was obtained from Amalgaam Co., Ltd, and LEU2 was amplified 
from pRS315 plasmid. Fused PCR products were directly transformed. For RNA 
binding protein fusion PP7-2 X GFP, pDZ276 plasmid (a gift from R. Singer) was 
directly used for transformation into yeast. 

Synthetic promoters driving either 24xPP7SL binding cassette (for single-cell 
3-colour movies) or mKO2 (for qPCR measurements) are composed of the fol- 
lowing elements: ADH1 terminator-UAS-basal HIS3 promoter (—101 to —1 of 
HIS3 gene)-24xPP7SL cassette with ADH1 terminator or mKO2-KANMAX or 
NATMX resistance cassette. ADH1 terminator and KANMX cassette were 
obtained from pKT vectors**. NATMX was obtained from pAG25 plasmid”. 
Basal HIS3 promoter was amplified from yeast genome. The 24xPP7SL cassette 
was obtained from Addgene (plasmid 31864). mKO2 was used for qPCR analysis 
because it is exogenous to yeast genome. Three different UAS cassettes contained 
one or both of the following elements: 4 copies of Msn2 binding motif 
(GATCTACAGCCCCTGGAAAAT, adopted from HSP12 promoter”) and/or 
2 copies of Migl binding motif (AATAAAAATGCGGGGAA, adopted 
from SUC2 promoter*’). These UAS cassettes were used to generate Msn2-spe- 
cific, Mig1-specific, and Msn2-Mig1 combinatorial promoters. The entire con- 
structs were flanked with sequences for integration into TRP1 locus of BY4741 
and were assembled into a pKT based vector. The plasmids were digested 
with Afel to release the entire cassette for integration into respective yeast 
strains. GSY1-24xPP7SL (for 3-colour movies) was generated by integration of 
24xPP7SL::KANMxX cassette directly downstream of the endogenous GSY1 gene. 

Zinc finger deletion mutants of Msn2 and Mig] proteins were constructed by 
direct transformation of PCR fragments containing desired mutations. Specifically, 
a fused PCR product containing MIG1(Aamino acid36-91)-mCherry::spHIS5 was 
used to generate Mig1-mCherry with its DNA binding domain deleted. Similarly, a 
fused PCR product containing MSN2(Aaa642-704)-mKO2::LEU2 was used for 
Msn2 zinc finger mutation. Deletion of Migl zinc finger appeared to impact its 
regulation of nuclear localization as the mutated Migl-mCherry became much 
more nuclear localized. This effect, however, does not affect our conclusions. 

Copper-inducible GLC7 strain was constructed by transforming a fusion PCR 

product of URA3-TEF terminator-CUP1 promoter flanked with sequences for 
integration to replace the endogenous GLC7 promoter. Transformants were 
selected on plates containing 100 uM CuSO,. 
Media and growth conditions. We adopted a minimal media formula with low 
auto-fluorescence for both culturing yeast cells and for microscopy’. Stock solu- 
tions for minerals (1,000), vitamins (1,000 X), as well as salts (50) were made 
separately. Final working media was made by mixing these three components 
together with amino acid drop-out mix (from Clontech) and Milli-Q water. 
Media was adjusted to desired glucose concentration with a glucose stock 
(40%, w/v). 

For overnight liquid culture, single colonies of yeast were picked from agar 
plates made with minimal media and dispensed into 2-3 ml of minimal media 
(2% glucose, —Ura or — His —Leu — Ura) in 14 ml round-bottom polypropylene 
tubes (BD catalogue no. 352059). Cells were grown in a 30 °C shaking incubator. 
The media and overnight culture procedures were the same for both single-cell 
microscopy and qPCR experiments. For microscopy, media was supplemented 
with 2mM sodium ascorbate (Sigma catalogue no. A7631) and 200 1M trolox 
(Sigma catalogue no. 238812) (except for media with H,O2) to help reduce 
fluorescent protein photobleaching and photo toxicity to cells. 

Time-lapse microscopy. All time-lapse experiments were performed on an 
Olympus IX81 microscope with 60 X objective and hardware autofocus (ZDC2). 
Fluorescence was excited by a LED light source (Lumencor SOLA Light Engine) and 
collected onto a scientific CMOS camera (Andor Neo sCMOS) with a 2-by-2 bin 
setting. For mKO2 and mCherry, single z-plane images were acquired. For GFP, 
a 5-slice z-stack was acquired (0.8 jum separation). The excitation and emission filters 
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for mKO2, mCherry and GFP are: Ex 534/20 and Em 572/28, Ex 580/20 and Em 630/ 
60, and Ex 472/30 and Em 535/50, respectively. The frame rate is 1 frame min !. 
Time-lapse movie automation was performed with Micro-Manager”’. The entire 
microscope room was maintained at ~ 26 °C with two heater fans and a temperature 
controller (Omega Engineering catalogue no. FCH-FGC20012R and catalogue no. 
CSC32)). 

Movies were acquired for single cells cultured in a dual-inlet microfluidic 
channel (~500 um wide), which enables media switching. The microfluidic 
device was fabricated with polydimethylsiloxane (PDMS) with a Sylgard 184 
silicone elastomer kit (DOW Corning) and bonded with 24mm X 50 mm glass 
coverslip (Gold Seal No. 1.5) after air-plasma cleaning (Harrick Plasma PDC- 
32G). The channels were cleaned by brief incubation with 2 M NaOH, followed 
by washes with 100% ethanol and water. A 15mg ml‘ concanavalin A (Sigma 
catalogue no. C7275) solution was incubated in the channels for about 10 min to 
coat the surface for adhering single yeast cells. Channels were washed with media 
before cell loading. Overnight yeast cultures were diluted back to OD¢00 nm = 0.1 
with 2 ml of fresh media (0.2% glucose, —Ura) and were allowed to grow for 
another ~3 h. Cells were briefly concentrated by centrifuge and loaded into the 
channel. Cells were incubated in the channels for 5 min. The device was then 
loaded onto a sample stage on the microscope. Two inlets of a channel were 
connected with tubing (Weico Wire&Cable catalogue no. TT-30) to two different 
media solutions in 10 ml syringes (BD catalogue no. 309604) containing different 
glucose or stimulant concentrations. These syringes were driven with separate 
syringe pumps (Harvard Apparatus Pump 11 elite) which were controlled by 
Micro-Manager. Outlet of the channel was connected to a waste container. Media 
flow rate was maintained at 5 jl min” ' throughout the movie except for during 
media change (at 50 pl min ! for 2 min). 

The starting glucose concentration and the time before media switching dif- 

fered in different experiments. For the transient glucose shift experiment 
(Fig. 1b), cells were in the channel with flowing 0.2% glucose media for more 
than 2h before switching to 0.1% glucose (acquisition of fluorescent images 
started 30 min before switching). For experiments in Fig. 2, cells were in the 
channel with flowing 0.05% glucose media for more than 2h before switching 
to 0.05% glucose plus specified stressor. For steady-state experiments in Figs 3, 4, 
cells were in the channel with flowing 0.2% glucose media for at least 10 min 
before switching to 0.05% or other designated glucose levels (from 0.4% to 
0.0125%). Acquisition of fluorescent images started 110 min after switching 
media conditions (that is, at steady-state). For cooper inducible GLC7 experi- 
ments (Fig. 4b), cells were cultured with minimal media without the addition of 
cooper until they were switched to a media containing 10 1M CuSO, for 110 min 
before the acquisition of fluorescent images. 
Image analysis for extracting single-cell traces. Single-cell traces were extracted 
from fluorescence images based on cell tracking performed on bright-field images. 
All analyses were implemented with custom Matlab code (with some modules 
obtained online as cited below). More specifically, a slightly defocused bright-field 
image was taken at each frame for segmentation and tracking purposes. 
Segmentation was performed by circular Hough transformation (CircularHough_ 
Grd function from Mathworks File Exchange). Segmented cell masks were first 
aligned across the entire movie frames to roughly correct for x-y stage drift (in order 
to enhance tracking accuracy). The masks were then fed into a tracking algorithm 
(u-track*’) to obtain final cell tracks. Tracks were examined manually, and those 
with errors were discarded and removed from further analysis. These filtered single- 
cell tracks were used to extract fluorescence traces. 

For analysis of, z-stack GFP images (for real-time transcription) we used a 
maximal intensity z-projection. Fluorescent images were then background sub- 
tracted (using background images acquired with media only) and corrected for 
field flatness caused by uneven illumination (using an image taken with fluor- 
escein). Nuclear localization was calculated by the difference between the mean 
intensity of the top five pixels and the median intensity of all pixels. Single-cell 
nuclear localization traces for mKO2 and mCherry were then obtained with 
tracks obtained above. Real-time transcriptional activity (that is, PP7-2 x GFP 
signal) was calculated as the intensity of the brightest pixel in the cell minus its 
local background. For the time when transcription is active, the brightest pixel 
coincides well with the transcription hotspot. Nuclear localization and transcrip- 
tional activity measurements were validated by manual examination of the 
extracted traces side-by-side with fluorescence images. 

Single-cell trace analysis and pulse-triggered averaging analysis. Single-cell 
traces were first baseline-subtracted and nuclear localization pulses were iden- 
tified. These pulses were then characterized and used for pulse-triggered aver- 
aging analysis. More specifically, calculation of the baselines for mKO2 and 
mCherry traces was based on a measure for the degree of nuclear localization. 
In this method, pairwise spatial distance summed over the top 10 brightest pixels 
in individual cells was used to determine when the fluorescence signal is nuclear 
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localized or not at a given frame. Nuclear localization scores from frames with the 
summed distance above a predefined threshold were used to estimate the baseline 
by a polynomial fit. For cases in which the baseline varied too much along a trace, 
baseline was estimated by fitting only the nuclear localization values that were 
below an empirically defined threshold. Baseline for GFP signal was estimated by 
polynomial fitting the GFP signals that were below an empirically defined thresh- 
old. Baseline subtraction procedures were validated by manual examination. 
Nuclear localization pulses were identified in both Msn2 and Mig] traces. Pulse 
identification was based on iPeak (from Mathworks File Exchange). Shoulder 
peaks were filtered out and combined with neighbouring peaks (with higher 
amplitude). The remaining peaks were filtered based on an amplitude threshold 
(at least 20% above the baseline values) as well as the summed pairwise distance 
(below a predefined threshold). Width of the pulses was measured for left and 
right portions of the pulses separately (first fitted with spline and then measured 
at half of the pulse amplitude or the amplitude threshold, whichever is smaller). 
For pulse-triggered averaging analysis, a 21 min window around the peak of each 
Msn2 pulse (that is, 10 min on each side) was used for classifying the relative 
timing. This time window was chosen based on the frequency of Msn2 pulses. 
Within this window, all Mig] pulses were identified. If the peak of a triggered 
Msn2 pulse fell into the span (defined by pulse width) of a Mig! pulse, it was 
classified as an overlapping event. Otherwise, it was classified as a non-overlap- 
ping event. A more detailed classification based on the distance between the peak 
of Msn2 pulse and the edge (defined by pulse width) of the Mig] pulse (if multiple 
Mig] pulses occur within the window, the one with maximum pulse amplitude 
was chosen for this classification) can also be done as shown in Extended Data Fig. 
5. Overlapping and non-overlapping events were averaged separately. Note that a 
larger time window (that is, a 26 min window with 10 min on the left of the peak 
and 15 min on the right) was chosen for averaging in Fig. 3 and Extended Data 
Figs 4, 5 in order to capture and measure the prolonged transcriptional responses 
in the GFP dynamics. 
Cross-correlation analysis. Several figures include cross-correlation analysis 
(Figs 3f, 4b and Extended Data Figs 1g, 7g and 9d). In these cases, we first compute 
the cross-correlation function for each cell in a given data set, and then average 
the resulting functions. Individual cross-correlations were based on mean- 
subtracted signals and normalized, computed using the following expression: 


((x(0)— (2) (t+) = (y))) 
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Here, angled brackets denote means, and C,,(t) is the cross-correlation of x(t) 
and y(t) at time lag t. 

Quantitative PCR analysis. We used qPCR analysis to validate the single-cell 
transcriptional response, with similar culture procedures for both microscopy 
and RNA analysis. In this protocol, cells were exposed to defined stimulants for 
10 min and RNA was extracted for two-step RT-qPCR (reverse transcription 
followed by qPCR). Note that the concentrations of salt, ethanol, and HO, were 
doubled when compared to the microfluidic single-cell assay (that is, 200 mM 
versus 100 mM NaCl, 5% ethanol versus 2.5% ethanol, 0.5 mM versus 0.25 mM 
H,O,). Overnight cultures were diluted to OD¢00 nm = 0.075 with 20 ml of 0.2% 
glucose (— Ura) in 250 ml flask and allowed to grow until the OD¢00 nm reached 
above 0.2 (about 3-4h). For transient stress experiments, cultures were then 
diluted back to OD¢00nm = 0.2 with 20 ml of 0.05% glucose in 250 ml flask and 
allowed to grow for another 2h. Cultures were split into 14ml polypropylene 
tubes (4 ml each). Stresses were applied by mixing concentrated stock solutions 
(such as 5 M NaCl, 100% ethanol, 0.83 M H20>) with the culture or by moving the 
culture tubes to a 37 °C shaking incubator (for heat shock). After precisely 10 min 
of stress application, each culture was mixed with 6 ml pre-chilled methanol (with 
dry ice/ethanol bath) in a 50 ml tube to rapidly fix the cells. For steady-state 
experiments, cells were diluted to OD¢00 nm = 0.1 with 4 ml fresh media of desig- 
nated condition (different glucose concentration with or without additional 
CuSO,) in 50 ml falcon tube. Cultures were allowed to grow for 2 h and cells were 
mixed quickly with cold methanol as above. After >1h in cold methanol, cells 
were collected by centrifuging at 4°C and washed with ice cold water. Prior to 
performing standard RNA extraction protocols (with on-column DNase diges- 
tion) with RNeasy mini kits (Qiagen), cells were enzymatically treated with 100 pl 
of 2 Up * lyticase solution (Sigma catalogue no. L2524) for 10 min at 30 °C. The 
extracted RNA absorbance spectrum was analysed with NanoDrop and 1 tpg RNA 
was used for a standard 20 ul iScript (Bio-Rad) reverse transcription reaction. The 
resulting cDNA was diluted 4X with water before proceeding to qPCR reaction. A 
typical 10 pl qPCR reaction was assembled with 5 il iQ SYBR Green Supermix 
(Bio-Rad), 2 pl primers (1.5 [tM each), 2 pl of cDNA, and 1 ul of water. Reactions 
were performed on a CFX96 Real-Time machine (Bio-Rad). Each reaction had 
=2 technical replicates. Three reference genes were included (ACT1, UBC6, 
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TFC1) for each sample. The latter two were based on recommendations by 
Teste et al.°. The mean Cq values of these reference genes were used for the 
calculation of AACq (or fold-change as 2°“) for each gene between sample and 
control. Calculations of AACq were done by CFX Manager Software (Bio-Rad) 
and final processing was performed by Matlab (Mathworks). Error bars were 
calculated by taking the standard errors of =3 biological replicates. Primers were 
designed according to manufacture instructions for iQ SYBR Green and were 
blasted against the yeast transcriptome (Primer-Blast™) to avoid nonspecific 
priming. The following primer sequences were used: 

ACT1_F: ACATCGTTATGTCCGGTGGT; ACT1_R: CATGGAAGATGGA 
GCCAAAG; UBC6_F: AGGACCTGCGGATACTCCTT; UBC6_R: TCTGAT 
AGCCGGTGGTTTGT; TFC1_F: AGCGCTGGCACTCATATCTT; TFC1_R: 
TTGGGCGTATTCCACTGAAC; mKO2_F: GTGATCAAGCCCGAGATGAA; 
mKO2_R: CATCTCCTGATGTCCCTCGT; GSY1_F: ACTGGTTGATTGAG 
GGAGCA; GSY1_R: GACCATAGGTCAGCCTTCCA; EMI2_F: AATGGTGA 
CGGAACCTTTGA; EMI2_R: GCGACCCAGGTAGCTAAACA; GLC3_F: CC 
GCTCCATAGGTGGTACTG; GLC3_R: ACTTCCCATCTCCCATTCATC; GP 
H1_F: TCTGGCCACCCATGAATTAG; GPH1_R: GCAACGCTCAGGACAC 
TCTT; IGD1_F: AGCAATGGTAACAGCGCAAG; IGD1_R: CTCCAAACATG 
TGAAGCTGGT. 

RNA-Seq library construction and data analysis. For data shown in Extended 
Data Fig. 2, RNA-Seq was performed with libraries prepared from the RNA 
samples collected from cells of three different strains (no deletion strain and 
deletions of either msn2 or migl) subjected to no treatment (control), 
200 mM NaCl, or 2.5% ethanol. For data shown in Extended Data Fig. 8d, 
RNA-Seq was performed with libraries prepared from the RNA samples collected 
from cells of the no deletion strain across 9 glucose concentrations and one msn2 
deletion strain at 0.2% glucose. RNA sample preparation was similar to the 
descriptions in the previous section. Library was constructed according to stand- 
ard Illumina protocols. Sequencing was performed on a HiSeq 2500 sequencer. 
Both library construction and sequencing were performed at the core sequencing 
facility at Caltech. For the transient experiments, two biological replicates for each 
sample collected on different days were sequenced and analysed. Analysis of the 
sequencing data was performed with a local instance of Galaxy*. A standard 
analysis pipeline was used (alignment with Tophat**). Statistical test of differ- 
ential expression between conditions was performed with duplicates using 
DESeq2”. 

Calculation of expected-by-chance fraction of overlapping pulsing. The heat 
map in Fig. 3g showed the expected fraction of Msn2 pulses that overlap with 
Mig] pulses. This expected fraction measures the percentage of Msn2 pulses that 
would coincide with Mig1 pulses assuming the factors pulse independently of 
each other. Because an overlapping event is defined as when the peak of an Msn2 
pulse falls into the time span of a Mig] pulse, its expected fraction can be calcu- 
lated as the fraction of time that Mig] pulses occupy and is independent of Msn2 


fscitener tak number of Mig] pulses per hour x mean Mig] duration As 


1 hour 
shown in Extended Data Fig. 8a, this calculated expected-by-chance overlap 


fraction is almost identical to the measured overlap fraction from an artificial 
population of cells where Msn2 and Mig dynamics are scrambled to enforce 
independence. 

Fitting gene expression data with different models. In Extended Data Fig. 8b-d, 
we compared the ability of three models to fit combinatorial target gene express- 
ion levels across a range of glucose concentrations. The first model (‘active- 
passive’) includes both active and passive modulation, the second model (‘passive 
only’) includes only passive modulation, that is, assumes independent Msn2 and 
Mig] dynamics, and the third model (“Msn2 only’) assumes Mig] does not reduce 
the effect of the Msn2 pulses (see Supplementary Discussion). In all models, gene 
expression is assumed to be activated by Msn2 and also occur at a basal level in the 
absence of nuclear Msn2. Mig] is assumed to suppress both Msn2-activated 
(except in the Msn2 only model), and basal expression. In these models, express- 
ion is thus proportional to the frequency of effective Msn2 pulses (those not 
suppressed by Mig] pulses, see definition below), plus the promoter-specific basal 
activity: 


Enodel — afMisn2eft + b Ovrigiout 
Here, i labels the glucose condition; a denotes the mean amount of gene express- 
ion produced by each effective Msn2 pulse; fi,,,,er¢ is the frequency of effective 
Msn2 pulses per hour (calculated based on single-cell data, see details below); b is 
the basal promoter activity when Mig] is out of the nucleus; and Prrigiout is the 
fraction of time that Mig] is out of the nucleus (also calculated based on single- 
cell data). Note that the three models differ only in the effective Msn2 pulse 
frequency. In general, the active-passive model has the lowest fi. .e¢7> because 
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in this model Mig] pulses suppress the effects of Msn2 pulses even more fre- 
quently than expected if Msn2 and Mig] were independent, that is, in the ‘passive 
only’ model. In contrast, the ‘Msn2 only’ model has the highest fir oes: 

We calculated the effective Msn2 frequency, Fae with two different levels of 
temporal precision (see Supplementary Discussion). The simpler binary relative 
timing model considers Msn2 pulses to be either overlapping or non-overlapping 
with Mig], as in Fig. 3. By contrast, the more precise continuous relative timing 
model allows for the empirically observed continuous dependence of expression 
level on the time interval between the Msn2 and Mig1 pulses, as shown in 
Extended Data Fig. 5b. 

In the binary model, the effective Msn2 pulse frequency is simply the frequency 
of non-overlapping Msn2 pulses (Fig. 3). In the continuous model, the effect of an 
observed Msn2 pulse on a natural target’s gene expression was determined by its 
pulse timing relative to Mig] using the results in Extended Data Fig. 5b. More 
specifically, we normalized the data in Extended Data Fig. 5b such that Msn2 only 
pulses (those at the longest absolute time intervals) have a relative expression level 
of 1, while overlapping Msn2 pulses (time interval 0) have a relative expression 
level of 0. For each observed Msn2 pulse we calculated an effective gene express- 
ion contribution based on its timing relative to Mig]. This calculation was per- 
formed across all traces and all glucose concentrations to obtain fi,,,..¢- Prior to 
fitting, we converted the relative qPCR expression data to an absolute scale 
(equivalent to FPKM, fragments per kilobase of transcript per million mapped 
reads) using the RNA-seq data at 0.05% glucose as a reference (Extended Data 
Fig. 2f). We also used RNA-seq data from an msn2 mutant to independently 
estimate parameter b. Thus, for each of the three models, only the parameter a 
needs to be fit. The least-squares fitting was performed by minimizing the error 


9 ; . 472 f 
function > [Bo del — | , where E.,,, denotes the experimentally measured gene 


expression levels at glucose level i from qPCR and RNA-seq data sets. 

Statistical analysis. To compare single-cell data between different conditions, we 
computed the 95% confidence intervals of the sample mean for each set of single 
cells by the bootstrap method. More specifically, resampling with replacement 
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was implemented with Matlab and 2,000 resamplings of the same sample size 
were obtained for each set of single cells. These 2,000 sets of single-cell data were 
then used for downstream analysis such as pulse-triggered averaging analysis and 
others. Bias-corrected 95% confidence interval*® of the 2,000 samples were then 
calculated and represented as error bars or shaded regions. To compare distribu- 
tions, the Kolmogorov-Smirnov test was used. 

No statistical methods were used to predetermine sample size. The experi- 
ments were not randomized, and the investigators were not blinded to allocation 
during experiments and outcome assessment. 
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Extended Data Figure 1 | Single-cell analysis of relative pulse timing 
modulation by stress identity during transient response. a—c, Example traces 
for synthetic combinatorial (a), Msn2-specific (b), or Mig1-specific (c) 
promoters, in response to addition of 100 mM NaCl. Two cells are shown for 
each strain. For each cell, Msn2 and Mig] localization traces (green and red) 
and the corresponding promoter response (blue) are shown on separate panels 
(top and bottom). Vertical dashed line indicates time of NaCl addition. 

d-f, Similar example traces for the response to addition of 2.5% ethanol. 

g, Average cross-correlation function of the transient Msn2 and Mig] 
responses from t = 0-30 min after indicated stress. Cross-correlation between 
Msn2 and Mig] is negative at time lag zero for both glucose reduction and NaCl 
stresses, but positive for ethanol stress. h, i, Averaged (left) and single-cell 
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(right) nuclear localization traces of Msn2-mKO2 and Migl-mCherry in 
response to 37 °C heat shock (h) or 0.25 mM H,0, (i). j, k, Msn2 and Mig1 
dynamics observed in Fig. 2b, c do not depend on the deletions introduced to 
the strain background. Averaged nuclear localization traces of Msn2-mKO2 
and Migl-mCherry in response to 100 mM NaC] (j) or 2.5% ethanol (kk) for 
a control strain without msn4 mig2 deletions. Shading indicates 95% confidence 
intervals of the mean. I-n, Standard deviation representations of different 
sets of single-cell data (presented in main figures). The mean is indicated with 
a solid line, and + 1 standard deviation ranges are indicated by shading. 

1, Nuclear localization responses of Msn2-mKO2 (green) and Migl-mCherry 
(red) to downshift in glucose level (see Fig. 1b). m, n, Nuclear localizations and 
transcriptional responses to NaC] and ethanol. (see Fig. 2b, c). 
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Extended Data Figure 2 | Additional data and analysis for transient stress 
responses. a, Fold-change in expression in response to different stresses for 
synthetic combinatorial target gene for three genetic backgrounds: no 
deletion (MSN2 MIG1, data from Fig. 2f), msn2 deletion, and mig! deletion. 
b, Similar plot for the endogenous target gene GSY1. Cells were treated with 
designated stress for 10 min and = 3 biological replicates were averaged 
(error bar indicates s.e.m.). P value was obtained from two-tailed t-test. 

c, d, Averaged transcriptional responses of GSY1-24xPP7 in response to 100 mM 
NaCl (c) or 2.5% ethanol (d) for three genetic backgrounds: no deletion, 

mig! deletion, and msn2 deletion. Averaged nuclear localization traces of 
Msn2-mKO2 and Migl-mCherry for the ‘no deletion’ strain are shown on the 
top panels. e, Averaged nuclear localization traces of Msn2-mKO2 and 
Migl-mCherry (top) and corresponding transcriptional responses for GSY1- 
24xPP7 in response to glucose downshift (from 0.2% to 0.1%). Shading in 


c-e indicates 95% confidence intervals of the mean. f-k, RNA-seq analysis 
(see Methods and Supplementary Discussion for more details). f, log, fold- 
changes (LFC) in gene expression of 31 identified combinatorial targets 
(including GSY1; brown circle, indicated by green arrow) in response to NaCl 
(x axis) and ethanol (y axis) for wild-type background (that is, no deletion 

of either MSN2 or MIGI1). g, The differences in LFC between wild-type and 
mig deletion for both NaCl (x axis) and ethanol (y axis). h, The differences 
in LFC between wild-type and msn2 deletion for both NaCl (x axis) and 
ethanol (y axis). i, The effect of Msn2 for each target was plotted against the 
corresponding number of Msn2 binding sites. j, Analogous plot for the effect of 
Mig] binding sites. k, Correlation coefficients between the effect of Msn2 or 
Mig] and the number of Msn2 or Mig] binding motif, respectively. Error bars in 
f-h indicate standard deviations from two biological replicates. Error bars in 
k represent 95% confidence intervals from bootstrap. 
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Extended Data Figure 3 | Example 3-colour single-cell traces under steady- 


state conditions, and schematic diagram of pulse-triggered averaging analysis. 


a, b, Example 3-colour single-cell traces for synthetic (a) and natural (b) 
promoters under constant glucose (0.05%). Two cells are shown for each 
promoter. For each cell, nuclear localization traces are shown on the top 
and PP7-2 X GFP transcriptional output signal is shown on the bottom. 
c, Schematic illustration of pulse-triggered averaging analysis. Msn2 pulses 
were identified (green arrows) and sorted based on their relationship with 


pulses 


Average 


pulses 


the Mig] signal within a 21 min time window (see Methods). Horizontal green 
and red lines underneath top time trace plot indicate width of identified Msn2 
and Mig] pulses, respectively. Msn2 pulses whose peaks overlap with Mig1 
pulses were categorized as overlapping events (orange arrows) while the rest of 
Msn2 pulses were categorized as non-overlapping events (purple arrows). 
Overlapping and non-overlapping events were then averaged separately 
(bottom schematics). 
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Extended Data Figure 4 | Pulse-triggered averaging analysis for control 
promoters and for delayed pulse timing events. a, b, Plots analogous to those 
in Fig. 3d, e for additional synthetic and natural promoters. The GSY1 promoter 
was examined in strains with Msn2 or Mig] zinc-finger deletions. For gene 
expression, areas under curves were analysed and presented in ¢, d. ¢, Relative 
pulse timing-dependent gene expression occurs for combinatorial promoters 
but not pure Msn2 or Mig] target promoters. Bars represent integrated gene 
expression based on area under curve from Fig. 3d, e and a, b. d, Plot analogous 
to c for the natural GSY1 target gene. Binding of the transcription factors was 
abolished by mutations in zinc finger DNA-binding domains, indicated by 
crosses. e, Distributions of gene expression (estimated as integrated area under 
curve) per non-overlapping or overlapping event for both synthetic and natural 
combinatorial promoters (real data (solid) versus control data (dashed); top) 
and ratios between real and control data (bottom). Control data was measured 
from scrambled population of cells. For the real data, the distributions of 


non-overlapping and overlapping events are significantly different (by 
Kolmogorov-Smirnov test) with P values of 2.1 X 10-1” and 1.2 X 107° for 
synthetic and natural promoters, respectively. In contrast, for control data, they 
are not significantly different (P values: 0.4520 and 0.9888). For the calculation 
of ratios, averages of the non-overlapping and overlapping control data were 
used as control. f-i, Pulse-triggered averaging analysis of ‘delayed’ events in 
which an Msn2 pulse is followed by a Mig] pulse (see Supplementary Discussion 
for details). f, Overlapping events were subdivided into delayed and non-delayed 
depending, as shown. Corresponding mean Msn2 and Mig] signals as well as 
transcriptional responses were plotted for both synthetic and natural promoters. 
A similar classification was performed for non-overlapping events (g). Area 
under curve for f, g was plotted for direct comparison of gene expression 
between delayed and non-delayed pulse timing events (h, i). Shading and error 
bars indicate 95% confidence intervals of the mean. Schematic promoters 
indicate whether the synthetic or natural GSY1 promoter were used in each case. 
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Extended Data Figure 5 | Analysis of mean gene expression dependence 
on time interval (continuous relative timing) between Msn2 and Mig] 
pulses. a, b, Mean expression from both synthetic (a) and natural (b) target 
promoters depends on the time interval between Msn2 and Mig] pulses (that is, 
interval between the peak of an Msn2 pulse and the edge of the nearest Mig] 
pulse). For each time interval, mean expression values were determined by 
integrating the area under the baseline-subtracted averaged PP7 traces, and 
averaging within bins of similar pulse interval. c, Specifically, Msn2 pulses were 
categorized on the basis of the pulse interval between Msn2 and Mig] and the 
corresponding PP7 signals were averaged and their areas under curve were 
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plotted (Methods). The pulse interval ranges from —9 to 9 min, which 
represents the bin centre of each 2-min bin (for example, 1 min represents 
the range 2 min = interval > 0 min), with the 0 min interval representing 
overlapping events. Both >10 or <—10 min intervals represent events where 
Msn2 pulses were not surrounded by any Mig] pulses within 21 min. d,e, Msn2 
and Mig] regulation are both necessary for continuous relative timing- 
dependent gene expression under constant glucose condition. Analysis similar 
to a, b was performed on synthetic Msn2- and Mig1-specific promoters (d) and 
natural GSY1 promoter with Msn2 or Mig] zinc finger deletion mutants (e). 
Shading and error bars indicate 95% confidence intervals of the mean. 
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Extended Data Figure 6 | Example single-cell nuclear localization traces for _ indicated glucose level from 0.2% glucose at 110 min before time zero (that is, 
different constant glucose conditions. Two single-cell traces are shown for _ beginning of movie acquisition). 
each indicated glucose level (boxed percentage values). Cells were switched to 
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Extended Data Figure 7 | Characterization of Msn2 and Mig! pulses and 
average cross-correlation functions between Msn2 and Mig] in individual 
cells across different constant glucose concentrations. a, Pulse frequency, 
amplitude, and duration analysis. Single-cell traces at each glucose level were 
analysed and the mean frequency, amplitude and duration for both Msn2 
and Mig] were plotted. b, c, Distributions of total number of pulses per trace 
across glucose concentrations (b), along with corresponding fits to Poisson 
distributions (shown as cumulative distributions, c). Kolmogorov-Smirnov 
(KS) tests showed that these distributions differ significantly from Poisson 
distributions (P< 10 '°). de, Analogous plots for the distributions of inter- 
pulse time intervals (d), and corresponding fits to exponential distributions 
(e). These distributions differ significantly from exponential distributions 
according to KS tests (P< 10°”). f, Distributions of pulse duration for Msn2 
and Mig] across glucose concentrations. g, Cross-correlation function 

(solid blue) of Msn2 and Mig] nuclear localization traces, that is, cross- 
corr(Msn2, Mig1) (Methods). Dashed blue lines represent negative 
(independent) controls, calculated by scrambling the Msn2-Mig1 trace 


pairs within a population of cells (that is, cross-correlating Msn2 from one cell 
with Mig] from another, randomly chosen, cell). Shading and error bars 
indicate 95% confidence intervals of the mean. The number of cells analysed in 
each glucose concentration: 1,511 (0.4%), 3,475 (0.2%), 2,605 (0.15%), 2,075 
(0.1%), 3,034 (0.075%), 2,768 (0.05%), 1,392 (0.025%), 2,055 (0.02%), and 1,906 
(0.0125%). h, Two different localization metrics show similar Msn2 and Mig1 
state distributions. Top left, histogram of the intensity score for Msn2 and Mig1 
shows long-tailed distributions for both proteins with peaks around zero (basal 
state). Insert, zoomed-in view of the tails. Top right, analogous plots for the 
signal proximity score also show long-tailed distributions with clear basal 
states. Signal proximity is the inverse of the distance-based localization metric 
described in the Methods section. High signal proximity indicates that the top 
10 brightest pixels in the cell are close to each other. (Bottom) Signal intensity 
positively correlates with signal proximity for both Msn2 and Mig], suggesting 
that these two independent scores show related features. This data are for cells 
at 0.05% glucose. Similar behaviours are observed across other glucose 
concentrations. 
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Extended Data Figure 8 | Further characterization of relative pulse 

timing modulation under steady-state conditions. a, Left, experimentally 
measured overlapping fraction (solid black) can be compared to minimum and 
maximum possible overlapping fractions (bottom and top dashed lines, 
respectively). The expected overlapping fraction for independent Msn2 and 
Mig1 dynamics is determined two ways: either computed from the Mig] duty 
cycle (dashed black), or measured from scrambled populations (dashed red). 
Minimum and maximum possible fractions were calculated with the measured 
duty cycles of Msn2 and Mig] pulses. Right, the ratios of measured overlapping 
fraction to expected overlapping fraction across glucose concentrations. 

b, Relative pulse timing modulation explains gene-expression dependence on 
glucose level for combinatorial target promoters. Black circles represent mean 
expression of 5 genes measured by qPCR (see Methods for normalization). 
Data were fit with three models, as indicated. See Methods and Supplementary 
Discussion for more details on binary and continuous timing models. R” values 
for fits are indicated in corresponding colours. Error bars indicate s.e.m. 
calculated from 3 biological replicates. c, Expression data for the 5 individual 
genes fit to the binary timing (dashed lines; R? values in dashed box) as well as 
continuous timing (solid lines; R’ values in solid box) models. d, Analysis of 
RNA-seq expression data across 9 glucose concentrations. The averaged 


expression levels from 28 of the 31 identified combinatorial targets (Extended 
Data Fig. 2f-k) were fit with the binary or continuous timing modulation 
models (left and right plots, respectively). Three genes were excluded because 
they did not display a monotonic dependence on glucose (YER067C-A, 
YKRO98C, YLR109W). In this analysis, parameter b was independently 
estimated from an msn2 mutant at 0.2% glucose (samples collected on the same 
day). e, Glucose level modulates the fraction of delayed pulse timing events (see 
also Extended Data Fig. 4 and Supplementary Discussion). Total fractions of 
delayed overlapping (see Extended Data Fig. 4e, left) and delayed non- 
overlapping pulse events (see Extended Data Fig. 4f, left) were plotted across 
glucose concentrations. Expected fractions were computed from ‘scrambled’ 
populations where Msn2 and Mig] dynamics are, by construction, 
independent. f, g, Glucose concentration also modulates relative pulse timing in 
a control strain without deletions of msn4 and mig2. f, Pulse characteristics of 
both Msn2 and Mig] for varying glucose concentrations. g, Measured versus 
expected overlapping fractions across different glucose concentrations (see 

a) for the wild-type background that was not deleted for Msn4 and Mig2. Error 
bars indicate 95% confidence intervals of the mean (except for b-d). The 
number of cells analysed for f, g: 618 (0.4%), 541 (0.2%), 714 (0.025%), and 775 
(0.0125%). 
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Extended Data Figure 9 | Additional effects of stress level and type on 
transient and steady-state responses. a, Stress level does not modulate relative 
pulse timing during transient responses. Averaged nuclear localization traces of 
Msn2-mKO2 and Mig1-mCherry during transient response to 50 mM NaCl 
(left) or 1.25% ethanol (right) are shown (see Fig. 2b, c). b, Additional stresses 
modulate relative timing during steady-state responses. Changes in pulse 
characteristics of both Msn2 and Mig] in response to the addition of 

100 mM NaCl or 2.5% ethanol during steady-state growth at 0.05% glucose. 


+100mM NaCl +2.5% EtOH 


Time lag (min) 


c, Measured (black) versus expected (grey) overlapping fractions for the same 
3 conditions as in b. d, Averaged cross-correlation between Msn2 and Mig1 
time traces for the same three conditions. See Supplementary Discussion for 
additional discussion. Shading and error bars indicate 95% confidence intervals 
of the mean. The number of cells analysed for b, d: 2,768 (0.05% glucose), 2,178 
(0.05% glucose with 100 mM NaCl) and 2,115 (0.05% glucose with 2.5% 
ethanol). 


©2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


a 
c —WT withot Cu** Mut without Cu2* Mut with Cu?* 
5 4 
50 
4 
> _— 
a) Z es 
"URA3= TEFter* CUP1pro * GLC7 + £ 3 30 = 
= & & 
b Wr wi Cur Mut w/o Cu* > 2 © é 
WT wie cue / Mut w/ Cu2*(1omM) & | Msn2 3 20} msn2 § 
dl a1 a 6 
35 = 100 5 
bem eee i 
iq oO o 
BO 4 34 2 80 35 
eS a =. 7 =| 
25 205 3 60 = 4 
Bo 
GO 94 
P52 2 40 3 
Bx 
i ® ors Mig’ Mig‘ Mig1 
<< 41 20 
0.2% 0.1% 0.025% 0.2% 0.1% 0.025% 0.2% 0.1% 0.025% 0.2% 0.1% 0.025% 
Glucose Concentration Glucose concentration 
ulse interval distributions 
d Pulse interval distributi , ; ; 
f Concentration-based vs. Time-based regulation 
O11 [oom A 1 ! 
2% N 
\ rs t I 
: 2 \ I 
bY £ 1 I 
I . — > 
T T 
5% ON 
io I I 
2 1 : 
3 8 I I 
3 O1F A% S 1 I 
a Sp 8 > OFF 
5 V—INN Input modulates — Time Input modulates Time 
2 J regulator concentration fraction of regulator ON time 
[o) 
5 g Increase Increase _ 
3 cooperativity overlapping pulsing 
i 
U 0.2 0 
—4 Oe 8 0.4 g 
ti gc oe 
E, 2 0.2 2 2 0.2 15 
5 = 
(0) 0 @ 0 0 @ 
0 2 4 0 2 4 0 0.2 04 0 0.2 04 
[TFa] [TFa] 9 8a 
: 10 5 10) 5 10 7 
Pulse interval between Msn2 and Mig‘ (min) Concentration-based Time-based 
e 
80 Increase Increase 
s cooperativity _ pulse coincidence 
o (2) 
5 0 0-0-0 SiN MeWA 
g Time 
a 40 
is} 
pe] 
a © ¢ C) 
[S} 
=) 
Zz ; ee 
10 0 10 20 30 @eL 2 am a 
as eo_|..o° -@, + 


Extended Data Figure 10 | A role for Glc7 in active relative pulse timing 
modulation under constant glucose conditions and functional aspect of 
relative pulse timing modulation. a, Schematic of potential mechanisms for 
Glc7-dependent relative pulse timing modulation (top) and construct design 
(bottom). Overlapping pulsing of Msn2 and Mig] could be induced by either a 
common kinase/phosphatase (such as Glc7) that directly or indirectly activates 
both Msn2 and Mig] localization, or by an upstream input (yellow circle) 
that simultaneously regulates kinases/phosphatases responsible for Msn2 and 
Mig] localization. To analyse the role of GLC7 in relative pulse timing, we 
constructed a strain in which the normal GLC7 promoter is replaced by a 
copper-inducible promoter, as shown. b, qPCR characterization of the 
inducible GLC7 strain across three glucose concentrations. Basal copper level in 
the media reduced GLC7 expression to less than 50% of its wild-type level. 
Addition of 10 1M CuSO, restored the expression to 110% to 140% of wild-type 
level. c, Changes in pulse characteristics in response to GLC7 reduction (red) 
and restoration (blue), compared to wild-type (black). d, Corresponding 
changes in pulse interval distribution. Pulse interval was calculated as the 
distance between the peak ofa given Msn2 pulse and the peak of its closest Mig] 
pulse within a 21 min window. e, Averaged nuclear localization traces of Msn2- 
mKO2 (green) and Mig1-mCherry (red) in response to 2.5% ethanol addition 
(dashed line) for the GLC7 reduction mutant. See Supplementary Discussion 
for additional discussion. Error bars in b indicate s.e.m. from 3 biological 


replicates. For c-e, shading and error bars indicate 95% confidence intervals of 
the mean. The number of cells analysed in the mutant strain: 671 (0.2% glucose 
without Cu’*), 540 (0.1% glucose without Cu**), 719 (0.025% glucose without 
Cu’*), 756 (0.2% glucose with Cu’*), 643 (0.1% glucose with Cu’*), and 
656 (0.025% glucose with Cu”). f-h, Functional aspect of relative pulse timing 
modulation (see Supplementary Note). f, Concentration-based versus time- 
based regulation. Input modulates the regulator concentration (left) versus the 
fraction of regulator ON time (right). g, Modulation of relative pulse timing in 
time-based regulation results in changes in the effective protein-protein 
cooperativity. Increasing protein-protein cooperativity in concentration-based 
regulation changes the probability of co-binding of TF, and TF; (left). 
Increasing overlapping pulsing in time-based regulation leads to qualitatively 
similar changes in the probability of co-binding (right). Protein cooperativity 
parameter Wax was increased from 1 to 2 for the left plots. Overlap fraction 
was increased from 0403 to 2 X 046g for the right plots (Wap = 1). Ka = Kp =5 
for both left and right. h, Schematic, relative pulse timing modulation affects 
the relative probability of simultaneous binding of two transcription factors 
to a target promoter (right). This effect is analogous to that generated by 
cooperative protein-protein interactions (left)**. Stronger protein-protein 
interactions or a higher overlap fraction can both increase the probability with 
which two transcription factors will be simultaneously bound at neighbouring 
sites (schematic pie charts). 
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Ion channels enable electrical 
communication in bacterial communities 


Arthur Prindle!, Jintao Liu'*, Munehiro Asally”*, San Ly’, Jordi Garcia- Ojalvo* & Giirol M. Siiel! 


The study of bacterial ion channels has provided fundamental insights into the structural basis of neuronal signalling; 
however, the native role of ion channels in bacteria has remained elusive. Here we show that ion channels conduct 
long-range electrical signals within bacterial biofilm communities through spatially propagating waves of potassium. 
These waves result from a positive feedback loop, in which a metabolic trigger induces release of intracellular potassium, 
which in turn depolarizes neighbouring cells. Propagating through the biofilm, this wave of depolarization coordinates 
metabolic states among cells in the interior and periphery of the biofilm. Deletion of the potassium channel abolishes this 
response. As predicted by a mathematical model, we further show that spatial propagation can be hindered by specific 
genetic perturbations to potassium channel gating. Together, these results demonstrate a function for ion channels in 
bacterial biofilms, and provide a prokaryotic paradigm for active, long-range electrical signalling in cellular communities. 


Communication through electrical signalling is prevalent among 
biological systems, with one of the most familiar examples being the 
action potential in neurons that is mediated by ion channels’. For 
many years, the study of bacterial ion channels has provided fun- 
damental insights into the structural basis of such neuronal signal- 
ling’’. In particular, the prokaryotic potassium ion channel KcsA 
provided the first structural information on ion selectivity and con- 
ductance*. More recently, it has been shown that bacteria possess 
many important classes of other ion channels, such as sodium chan- 
nels*, chloride channels®, calcium-gated potassium channels’ and 
ionotropic glutamate receptors®, similar to those found in neurons. 
However, the native role of these ion channels in bacteria has largely 
remained unclear””®. Efforts to uncover ion channel function in bac- 
teria have identified roles in the extreme acid resistance response® and 
in osmoregulation”, yet ion-specific channels do not appear to be 
solely responsible for these cellular processes. It remains unclear 
whether ion channels can support other unique functions in prokary- 
otes. We hypothesized that studying bacteria in their native context, 
the biofilm community, may reveal new clues about the function of 
ion channels in bacteria. 

Bacterial biofilms are organized communities containing billions of 
densely packed cells. Such communities can exhibit fascinating macro- 
scopic spatial coordination’*’. However, it remains unclear how 
microscopic bacteria can communicate effectively over large dis- 
tances. To investigate this question, we studied a Bacillus subtilis 
microbial community that was recently reported to undergo meta- 
bolic oscillations triggered by nutrient limitation’*. The oscillatory 
dynamics resulted from long-range metabolic co-dependence 
between cells in the interior and periphery of the biofilm (Fig. 1a)’*. 
Specifically, interior and peripheral cells compete for glutamate, while 
sharing ammonium. As a result, biofilm growth halts periodically, 
increasing nutrient availability for the sheltered interior cells. 
Interestingly, glutamate (Glu) and ammonium (NH,") are both 
charged metabolites, whose respective uptake and retention is known 
to depend on the transmembrane electrical potential and proton 
motive force!®”°. Therefore, we wondered whether metabolic coordi- 


nation among distant cells within the biofilm might also involve a 
form of electrochemical signalling. 


Oscillations in membrane potential 


To monitor long-range electrical fluctuations in the bacterial com- 
munity as a function of space and time, we grew biofilms in an uncon- 
ventionally large microfluidic device (Fig. 1b and ‘Microfluidics’ 
section of Methods). To measure electrical signalling, we used the 
fluorescent cationic dye thioflavin T (ThT) to quantify membrane 
potential within the biofilm. ThT is positively charged and can be 
retained in cells because of the negative electrical membrane potential 
inside cells. Thus, cells with a negative membrane potential will retain 
more ThT, allowing it to act as a Nernstian voltage indicator*”’. 
We confirmed that ThT faithfully reports the membrane potential 
by comparing it to an established reporter of membrane potential 
in bacteria”, 3,3’-dipropylthiadicarbocyanine iodide (DiSC3(5)) 
(Extended Data Fig. 1a). We found that ThT has an approximately 
threefold higher sensitivity to changes in membrane potential com- 
pared to DiSC3(5) (Extended Data Fig. 1a, inset). Furthermore, we 
exposed cells to minor changes in external pH, which is known to 
alter membrane potential’, and observed the expected changes in 
ThT (Extended Data Fig. 1b). Therefore, ThT accurately reports on 
changes in membrane potential for bacteria residing in biofilms. 

We next investigated changes in membrane potential during meta- 
bolic oscillations. In particular, quantitative measurements of ThT 
fluorescence showed global and self-sustained oscillations consistent 
with the reported period of the metabolic oscillations (Fig. 1c, 
Supplementary Video 1 and Extended Data Fig. 1c)'*. Furthermore, 
oscillations in ThT could be quenched by supplementation of the 
media with glutamine, which bypasses the need for glutamate and 
ammonium (Extended Data Fig. 1d). These data show a connection 
between metabolic oscillations and membrane potential. Notably, 
oscillations in membrane potential were synchronized among even 
the most distant regions of the biofilm community (Fig. 1d, e). We 
wondered whether active electrochemical signalling could be respons- 
ible for this long-range synchronization. 


1Division of Biological Sciences, University of California San Diego, California 92093, USA. *Warwick Integrative Synthetic Biology Centre, School of Life Sciences, University of Warwick, Coventry CV4 7AL, 
UK. 3Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain. 
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Figure 1 | Biofilms produce synchronized oscillations in membrane 
potential. a, Biofilms generate collective metabolic oscillations resulting from 
long-range metabolic interactions between interior and peripheral cells’*. It 
remains unclear how microscopic bacteria are capable of communicating over 
such macroscopic distances within biofilm communities. b, Schematic of the 
microfluidic device used throughout this study (left). Phase contrast image of a 
biofilm growing in the microfluidic device with the cell trap highlighted in 
yellow (right). Scale bar, 100 um. c, Global oscillations in membrane potential, 
as reported by thioflavin T (ThT), within the biofilm community. ThT is 
positively charged but not known to be actively transported, so it can be 
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Active propagation of potassium signal 

Changes in membrane potential involve the movement of charged 
species across the cellular membrane. We suspected the involvement 
of potassium, since it is the most abundant cation in all living cells” 
and has been implicated to have a role in biofilm formation”®’’. 
B. subtilis uses active potassium transport mechanisms to concentrate 
intracellular potassium at approximately 300mM’**°. This intra- 
cellular concentration is nearly 40 times the external media concen- 
tration. Consequently, sudden release of this potassium gradient 
would increase extracellular potassium concentration and generate 
a change in the membrane potential. Accordingly, we used a fluor- 
escent chemical potassium dye, asante potassium green-4 (APG-4"'), 
to measure the extracellular concentration of potassium in the biofilm 
(Fig. 2a and Extended Data Fig. 2a, b). We observed global oscillations 
in APG-4 that correlated with membrane potential, which suggests 
that the membrane potential oscillations could involve the release of 
potassium (Fig. 2b, c and Supplementary Video 2). In agreement with 
this finding, oscillations in extracellular potassium extended beyond 
the biofilm to the surrounding growth media (Extended Data Fig. 2c). 
We also measured the dynamics of sodium, another ion commonly 
used by cells to modulate membrane potential, and observed no 
oscillations (Extended Data Fig. 2d—f). Together, these data suggest 
that potassium has a role in the synchronized oscillations in mem- 
brane potential. 

Furthermore, we directly tested that oscillations in membrane 
potential were driven by flow of potassium across the cell membrane. 
Specifically, we clamped net potassium flux across the cell membrane 
by supplementing the growth media with 300 mM KCl (matching the 
intracellular potassium concentration) (Fig. 2d). When we applied 
this chemical potassium clamp, oscillations in membrane potential 
abruptly halted (Fig. 2e). Applying this clamp together with valino- 
mycin, a potassium ionophore that acts as potassium-specific carrier 
in the cellular membrane”, yielded a similar quenching of oscillations 
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retained in cells due to their negative membrane potential inside the cell. 
ThT fluorescence increases when the inside of the cell becomes more negative, 
and thus ThT is inversely related to the membrane potential. Scale bar, 

0.15 mm. Representative images shown are taken from over 75 independent 
biofilms. a.u., arbitrary units. d, Membrane potential oscillations are highly 
synchronized even between the most distant regions of the biofilm. To analyse 
synchronization, the edge region of the biofilm was identified and straightened 
(left) then plotted over time (right). e, Time traces of the heat map shown in d. 
Indicated in bold is the mean of 30 traces. 


(Extended Data Fig. 3a, b). Therefore, changes in the electrochemical 
potential for potassium appear to be required for the observed oscilla- 
tions in membrane potential. 

Next, we determined whether cells could actively propagate the 
extracellular potassium signal through the biofilm to sustain long- 
range communication. While diffusive signals decay over space and 
time, active signalling processes can amplify the signal, avoiding such 
decay (Fig. 2f). To determine which of these processes may be oper- 
ating in the biofilm, we observed the propagation of the extracellular 
potassium signal (Fig. 2g). Results show that the signal travels at a 
constant rate of propagation (Extended Data Fig. 3c, d). Furthermore, 
the amplitude of the signal does not decay with distance travelled, in 
contrast to what is predicted for passive potassium diffusion (Fig. 2h). 
These findings are consistent with a process in which cells actively 
propagate the potassium signal. Together, these results suggest that 
the biofilm synchronizes global oscillations in membrane potential by 
an active signalling process involving potassium ions. 


Potassium ion-channel-mediated signalling 


Motivated by our findings, we explored the role of ion channels in the 
observed potassium signalling. We focused on YugO, the only experi- 
mentally described potassium channel in B. subtilis, which is also 
reported to be important for biofilm formation*’. Potassium flux 
through YugO is gated by an intracellular TrkA domain, known to 
be regulated by the metabolic state of the cell****. Accordingly, we 
hypothesized that metabolic limitation could form the initial trigger 
for YugO activation. Specifically, since glutamate limitation is known 
to drive the underlying metabolic oscillations’*, we anticipated that 
transient removal of glutamate could initiate potassium release. To 
test this, we transiently deprived cells of glutamate and measured 
extracellular potassium in both wild-type and yugO deletion strains 
(see ‘Strains’ section of the Methods). As expected, we observed extra- 
cellular potassium increase for wild-type but not the yugO deletion 
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Figure 2 | Potassium release is involved in active signal propagation within 
the biofilm. a, An extracellular fluorescent chemical dye (APG-4) reports 
the concentration of potassium in the media (Extended Data Fig. 2a, b). For 
comparison, the same cells are shown stained with ThT, which is inversely 
related to the membrane potential. These images depict cells at the peak of 
the ThT oscillation cycle. Representative images are selected from six 
independent experiments. Scale bar, 2 um. b, Global oscillations in extracellular 
potassium throughout the biofilm. A white line indicates the edge of the 
biofilm. Representative images are selected from six independent experiments. 
Scale bar, 0.2 mm. ¢, Oscillations in membrane potential and extracellular 
potassium are synchronized, suggesting that potassium release is involved in 
global membrane potential oscillations. ThT is inversely related to the 
membrane potential. Representative traces are taken from the experiment 
shown in b. d, A chemical potassium clamp (300 mM KCl, matching the 
intracellular concentration”’) prevents the formation of potassium 


strain (Fig. 3a). These findings suggest that glutamate limitation can 
trigger the potassium signal via the YugO potassium channel. 

Next, we investigated whether YugO also has a role in the active 
propagation of the potassium signal. To test this, we measured the 
response of wild-type and yugO deletion strains to transient bursts of 
external potassium (300 mM KCl). As expected, potassium exposure 
first resulted in a short-term membrane potential depolarization in 
both strains. However, in the wild-type strain this initial depolariza- 
tion was typically followed by an extended hyperpolarization phase, 
which was not observed in the yugO deletion strain (Fig. 3b). This 
period of hyperpolarization was accompanied by an increase in extra- 
cellular potassium (Extended Data Fig. 4a). Together, these data indi- 
cate that potassium exposure triggers a release of intracellular 
potassium through YugO. Exposure to an equivalent concentration 
of sorbitol (an uncharged solute) did not elicit an equivalent response, 
ruling out purely osmotic effects (Extended Data Fig. 4b). Therefore, 
YugO appears to have a role in propagating the extracellular pot- 
assium signal within the biofilm. 
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electrochemical gradients across the cellular membrane. e, Clamping net 
potassium flux quenches oscillations in membrane potential. Representative 
trace is selected from two independent experiments. f, Illustration of the 
differences between passive signalling (diffusion) and active signalling. When 
cells passively respond to a signal, the range that the signal can propagate is 
limited due to the decay of signal amplitude. In contrast, when cells actively 
respond by amplifying the signal, propagation can extend over greater 
distances. g, We measured propagation of extracellular potassium by 
measuring APG-4 in time and along a length of approximately 1.5 mm within 
the biofilm. h, Extracellular potassium amplitude is relatively constant as the 
signal propagates, in contrast to the predicted amplitude decay of a passive 
signal. Representative data selected from six independent experiments. The 
diffusion line is calculated using the 2D diffusion equation and the diffusion 
coefficient for potassium within biofilms (Supplementary Information). 


Mathematical modelling of electrical signalling 


Our data thus point to a proposed mechanism where metabolically 
stressed cells release intracellular potassium, and the resulting ele- 
vated extracellular potassium imposes further metabolic stress onto 
neighbouring cells (Fig. 3c). In B. subtilis, glutamate is co-transported 
with two protons by the GltP transporter and this process depends on 
the proton motive force’’. Potassium-mediated depolarization of the 
membrane potential can transiently reduce the electrical component 
of the proton motive force”, and thereby lower glutamate uptake and 
intracellular ammonium retention'’®”®. Therefore, potassium- 
mediated signalling could propagate metabolic stress onto distant 
cells (Fig. 3c, right). Accordingly, hyperpolarization triggered by 
YugO activation may represent a cellular response to enhance 
glutamate uptake or ammonium retention. This notion is supported 
by our finding that the response to extracellular potassium can be 
abolished by growing cells in glutamine, an uncharged metabolite and 
preferred nitrogen source that bypasses the need for glutamate and 
ammonium” (Extended Data Fig. 4c). This result further supports the 
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Figure 3 | The molecular mechanism of signal propagation involves 
potassium channel gating. a, yugO is a potassium channel in B. subtilis that is 
gated intracellularly by a trkA domain, which is regulated by the metabolic state 
of the cell**-**. Withdrawing glutamate (the sole nitrogen source in MSgg 
media) induces an increase in extracellular potassium (APG-4) for wild-type 
(WT) but not the yugO deletion strain. Error bars indicate the mean + s.d. for 
three independent biofilms each. b, An external potassium shock (300 mM 
KCl) induces a short-term membrane potential depolarization in both wild- 
type and yugO deletion strains. However, in the wild type this initial 
depolarization was followed by hyperpolarization, which is not observed in the 
yugO deletion strain (mean ~ s.d. for 12 traces drawn from 3 biofilms each). 
ThT is inversely related to the membrane potential. c, Proposed model for 
potassium signalling. The initial trigger for potassium release is metabolic stress 
caused by glutamate limitation. External potassium depolarizes neighbouring 
cells, producing further nitrogen limitation by limiting glutamate uptake, and 
thus produces further metabolic stress. This cycle results in cell-cell 


specific link between potassium-mediated electrical signalling and 
metabolic stress. 

To determine whether the proposed potassium-channel-based 
mechanism is sufficient to account for the observed propagating 
pulses of electrical activity, we turned to mathematical modelling. 
Specifically, we considered a minimal conductance-based model 
describing the dynamics of the cell’s membrane potential in terms 
of a single potassium channel and a leak current (see ‘Mathematical 
Model’ section of the Supplementary Information). Consistent with 
our experimental results, this simple model exhibits transient depol- 
arization followed by hyperpolarization in response to local increases 
in extracellular potassium concentration (Fig. 3d). Furthermore, the 
model shows long-range propagation of these excitations without 
decay in the amplitude of membrane potential oscillations (Fig. 3e). 
Therefore, the proposed mechanism is mathematically sufficient to 
qualitatively account for the observed membrane potential dynamics 
and active propagation in space. 

The model also predicts that reduced efficiency of the potassium 
channel function could lead to degradation in long-range commun- 
ication (Fig. 3e). Since a complete yugO deletion interferes with 
development of large biofilms”, we constructed a strain in which 
we deleted the TrkA gating domain, leaving only the ion channel 
portion of YugO intact (see ‘Strains’ section of the Methods). 
Similarly truncated bacterial potassium channels have been shown 
to have altered gating and ion conductance****. Indeed, the 
yugOAtrkA mutant biofilms exhibited a reduced propagation of mem- 
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propagation of the potassium signal. d, A minimal conductance-based 

model describing the dynamics of the cell’s membrane potential in terms of 
a single potassium channel and a leak current. Consistent with our 
experimental results, this simple model exhibits transient depolarization 
followed by hyperpolarization in response to local increases in extracellular 
potassium concentration. e, The model predicts that manipulating channel 
gating and conductance will result in decaying amplitude in the spatial 
propagation of membrane potential oscillations. f, Maximum intensity 
projection of membrane potential change illustrating attenuated communi- 
cation within the biofilm in a yugOAtrkA deletion compared to wild-type 
biofilms (top). Heat map of oscillations taken from wild-type and yugOAtrkA 
mutant biofilms (bottom). Representative images are taken from three 
independent biofilm experiments in which wild-type and yugOAtrkA biofilms 
are compared head-to-head. Scale bars, 8 um. g, Quantification of normalized 
pulse amplitude from wild-type (n = 8 pulses) and yugOAtrkA (n = 12 
pulses) mutant biofilms (mean = s.e.m.). 


brane potential oscillations (Fig. 3f and Supplementary Video 3). 
Specifically, in contrast to wild type, the yugOAtrkA mutant shows 
decay in the signal amplitude from the interior of the biofilm to the 
cells at the periphery, which is also consistent with model predictions 
(Fig. 3g). Thus, YugO channel gating appears to promote efficient 
electrical communication between distant cells. 


Discussion 


Our findings suggest that bacteria use potassium ion-channel-mediated 
electrical signals to coordinate metabolism within the biofilm. The ensu- 
ing ‘bucket brigade’ of potassium release allows cells to rapidly commun- 
icate their metabolic state, taking advantage of a link between membrane 
potential and metabolic activity. This form of electrical communication 
can thus enhance the previously described long-range metabolic co- 
dependence in biofilms’®. Specifically, the wave of depolarization trig- 
gered by metabolically stressed interior cells would limit the ability of 
cells in the biofilm periphery to take up glutamate or retain ammonium, 
thereby allowing interior cells more access to these nutrients. This also 
provides a possible explanation for the observation that the yugO dele- 
tion strain has a defect in biofilm development”. Interestingly, owing to 
the rapid diffusivity of potassium ions in aqueous environments, it is 
also conceivable that even physically disconnected biofilms could be 
capable of synchronizing their metabolic oscillations by a similar 
exchange of potassium ions. 

The role of ion-channel-mediated electrical communication has long 
been appreciated*’. While cation channels are found in all organisms”"® 
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and potassium is the dominant intracellular cation”, electrical signalling 
is commonly viewed to be a property of neurons. However, several recent 
studies have suggested that in addition to traditional cell-to-cell com- 
munication systems such as quorum sensing”, bacteria may use electron 


flux“ to communicate. The herein described study of electrical coor- 


dination of metabolism in microbial communities may in turn hold some 
general insights that extend beyond bacteria. For example, the connec- 
tion between neuronal signalling and metabolic activity (neurometabo- 
lism) is an active area of research***. Furthermore, depletion of 
glutamate, the most common excitatory neurotransmitter’, also forms 
the initial trigger for these collective metabolic oscillations synchronized 
by potassium. Therefore, it is intriguing to think not only about the 
structural similarities between bacterial and human potassium ion chan- 
nels*’, but also their possible functional similarities with respect to long- 
range electrical communication. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Strains. All experiments were done using Bacillus subtilis NCIB 3610. The wild- 
type strain was a gift from W. Winkler (University of Maryland)”, and all other 
strains were derived from it and are listed in Extended Data Table 1. To make 
deletion strains, we used polymerase chain reaction (PCR) to amplify the desired 
regions from the wild-type strain. The PCR products were then put within the 
pER449 vector (gift from W. Winkler). For the trkA mutant, we deleted 
the C-terminal portion of yugO (amino acids 117-328), leaving only the 
N-terminal ion channel portion of yugO (amino acids 1-116). We identified 
the trkA region using Pfam (http://pfam.xfam.org/). All constructs were con- 
firmed by direct sequencing and then integrated into the chromosome of 
the wild-type strain by a standard one-step transformation procedure®. 
Finally, chromosomal integrations were confirmed by colony PCR using the 
corresponding primers. 

Growth conditions. The biofilms were grown in MSgg medium"* which contains 
5mM potassium phosphate buffer (pH 7.0), 100mM MOPS buffer (pH 7.0, 
adjusted using NaOH), 2mM MgCl, 700M CaCl, 50M MnCh, 100 uM 
FeCl3, 1M ZnCl, 2M thiamine HCl, 0.5% (v/v) glycerol and 0.5% (w/v) 
monosodium glutamate. The MSgg medium was made from stock solutions on 
the day of the experiment, and the stock solution for glutamate was newly made 
weekly. 

Microfluidics. We followed methods similar to a previous study’*. Briefly, we 
used the CellASIC ONIX Microfluidic Platform and the Y04D microfluidic plate 
(EMD Millipore). We used a pump pressure of 1 psi with only one media inlet 
open, which corresponds to a flow speed of ~16 ums’. On the day before the 
experiment, cells from —80 °C glycerol stock were streaked onto an LB agar plate 
and incubated at 37 °C overnight. The next morning, a single colony was picked 
from the plate and inoculated into 3 ml of LB broth and incubated in a 37°C 
shaker. After 2.5 h of incubation, the cell culture was centrifuged at 2,100 relative 
centrifugal force (rcf) for 1 min, and the cell pellet was re-suspended in MSgg and 
immediately loaded into microfluidic chambers. After loading, cells in the micro- 
fluidic chamber were incubated at 37 °C for 90 min, and then the temperature was 
kept at 30°C for the rest of the experiment. 

Time-lapse microscopy. The growth of the biofilms was recorded using phase- 
contrast microscopy. The microscopes used were Olympus IX83 and DeltaVision 
PersonalDV. To image entire biofilms, 10 < objectives were used in most of the 
experiments. Biofilm phase contrast and fluorescence images were taken every 
10 min, except in Fig. 2g where images were taken every 5 min. To generate Fig. 3b 
and Extended Data Figs 4a-c, where high temporal resolution was required, 
images were taken every minute. Whenever fluorescence images were recorded, 
we used the minimum exposure time that still provided a good signal-to-noise 
ratio (for example, we typically used 20 ms exposure for ThT and 100 ms expo- 
sure for APG-4). 

Image analysis. Fiji/ImageJ (National Institutes of Health) and MATLAB 
(MathWorks) were used for image analysis. We generated custom scripts and 
used the image analysis toolbox to perform image segmentation on biofilm phase 
contrast images. To measure biofilm growth rate, we identified the biofilm area in 
each frame by segmenting the images and took the derivative of biofilm radius 
over time. We identified the radius by assuming circular growth of the colony and 
taking the length from the centre of the cell trap to the biofilm edge. To generate 
membrane potential curves, we measured the fluorescence of ThT within the 
biofilm using the ImageJ ‘Plot Z-axis Profile’ command and performed sub- 
sequent analysis, such as normalization and subtracting of baseline signal, 
in MATLAB. 

Experimental reproducibility. Data shown in the main figures were drawn froma 
minimum of three independent experiments and often many more. For example, 
we analysed ThT oscillations (represented in Fig. 1c-e) in over 75 biofilms. In cases 
where only a single representative trace is shown, we analysed multiple regions 
within the biofilm to ensure accuracy of the analysis. In experiments comparing the 


wild-type and a mutant strain (yugO or yugOAtrkA), we always performed head-to- 
head experiments (separate chambers in the same microfluidic device) on the same 
day using the same media to eliminate possible artefacts. 

Mathematical modelling. The theoretical curves shown in Fig. 3d, e were gen- 
erated using a mathematical model inspired by the Hodgkin-Huxley model of 
neuronal excitability (Supplementary Information). The parameters used in the 
model (Extended Data Table 2) were constrained using a combination of literat- 
ure values and experimental data. Specifically, the response time to KCl shock 
(Fig. 3b) was used as a constraint on parameters with a time dimension and the 
spatial scale (lattice size of the 1d simulations) is extracted from the characteristic 
distance shown in Fig. 3g. 

Theoretical estimate of potassium diffusion within biofilms. We used the 
diffusion coefficient of potassium in water (19.7 X 10 °cm’s ')*? and reduced 
it to 70% in accordance with a reference on diffusion in biofilms”, yielding the 
value of the diffusion coefficient (13.8 X 10° ° cm? s~') used in the mathematical 
model as well as the theoretical curves plotted alongside our experimental data. 
To estimate the rate of potassium propagation by diffusion, we used the formula 
for 2D mean squared displacement (MSD): 


r=V4Dt 


Where r is the displacement, D is the diffusion coefficient, and t is time. We used 
this relationship to generate the curve shown in Extended Data Fig. 3d. We 
directly compared the log-log slope of the experimental data (slope = 1.1, 
R’ = 0.96) to that expected for diffusion (slope = 0.5) to further confirm that 
the experimental data cannot be explained by simple diffusion. 

To estimate the decay of amplitude by diffusion, we used the formula for the 
concentration profile of 2D diffusion: 


M r 
6) am ( - a) 


Where C is the concentration of potassium at displacement r and time f, M is a 
constant related to the initial pulse amplitude of potassium that we matched to the 
initial experimental pulse amplitude of APG-4, and D is the diffusion coefficient. 
We used this relationship to generate the curve in Fig. 2h. 

Dyes and concentrations. Thioflavin T (ThT) and DiSC3(5) were used at 10 1M. 
We used ThT and DiSC;(5) to track relative changes in the membrane potential, 
where the fluorescence of ThT increases when the cell becomes more inside nega- 
tive (hyperpolarizes). We found the sensitivity of ThT to be significantly higher 
than that of DiSC;(5), where sensitivity is defined as the ratio between the ampli- 
tude of oscillation and its error (Extended Data Fig. 1a). Furthermore, under our 
experimental conditions, DiSC;(5) appears to be absorbed by the PDMS in the 
microfluidic device. This hinders quantitative analysis (lower sensitivity) and also 
greatly increases the time required for the dye to diffuse into the biofilm. 

APG-4 (TEFLabs) was used at 2 1M. We used the membrane-impermeable 
TMA salt form to track the extracellular concentration of potassium. We veri- 
fied that APG-4 does not significantly diffuse into cells (Extended Data Fig. 2a). 
We also verified that APG-4 could measure extracellular potassium in MSgg 
media within our microfluidic device (Extended Data Fig. 2b). 

We used ANG-2 (TEFLabs) at 21M. We used the membrane-impermeable 
TMA‘ salt form to track the extracellular concentration of sodium (Extended 
Data Fig. 3c, d). 
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Extended Data Figure 1 | Thioflavin T (ThT) isa fluorescent reporter that is 
inversely related to the membrane potential. a, ThT and DiSC;(5), an 
established reporter of membrane potential in bacteria”*, both oscillate within 
biofilms. ThT has an approximately three fold higher sensitivity to changes 
in membrane potential compared to DiSC;(5). Sensitivity is defined as the ratio 
between peak height and error in peak height. Error bars indicate mean + s.d. 
(n = 8 biofilm regions, averaged over the 4 pulses shown). b, The cellular 
ThT fluorescence depends on the external pH, where higher pH results in 
greater membrane potential, as expected”. ThT itself is insensitive to these pH 
changes and the traces are background subtracted to eliminate possible 
artefacts. Representative trace is selected from three independent biofilms. 
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c, Oscillations in ThT and growth rate are inversely correlated, linking 
membrane potential oscillations to the metabolic cycle which produces 
periodic growth pauses'*. Growth rate is calculated by taking the derivative of 
biofilm radius over time (Supplementary Information). Representative trace 
is selected from over 75 independent biofilms. d, Replacing glutamate with 
0.2% glutamine, which eliminates the need to take up glutamate or retain 
ammonium, quenches ThT oscillations. This further suggests that ThT 
oscillations are specific to the metabolic cycle involving glutamate and 
ammonium. A representative trace was selected from three independent 
experiments. 
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Extended Data Figure 2 | A fluorescent reporter of extracellular potassium 
(APG-4) indicates that potassium has a role in membrane potential oscilla- 
tions. a, High-resolution images showing the intracellular localization of 
ThT and primarily extracellular localization of APG-4 (top). Quantification of 
ThT and APG-4 along the 2 jm profile indicated in the phase image indicates 
that APG-4 does not significantly diffuse into the cell (bottom). Representative 
images are selected from six independent experiments. b, Induction curve for 
APG-4 generated using externally supplemented KCl. The experiment was 
repeated twice. c, Oscillations in extracellular potassium in the surrounding 
cell-free region during biofilm oscillations. These oscillations occurred during 
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the experiment shown in Fig. 2b, c and the pulses are synchronized between the 
biofilm and the surrounding cell-free region. Representative trace is selected 
from six independent experiments. d, Induction curve for ANG-2 generated 
using externally supplemented NaCl. The experiment was repeated twice. 

e, Simultaneous measurement of ThT and ANG-2 indicates a lack of 
oscillations in extracellular sodium. Representative trace selected from three 
independent biofilms. f, Furthermore, perturbing extracellular sodium 
concentrations in the media had no detectable effect on membrane potential 
oscillations. A representative trace was selected from four independent 
experiments. 
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Extended Data Figure 3 | Active propagation of potassium signal within 
the biofilm. a, A chemical potassium clamp (300 mM KCl, matching the 
intracellular concentration”’, and 30 uM valinomycin) prevents the formation 
of potassium electrochemical gradients across the cellular membrane. 
Valinomycin is an antibiotic that creates potassium-specific carriers in the 
cellular membrane”. b, Clamping net potassium flux quenches oscillations in 
membrane potential. A representative trace was selected from two independent 
biofilms. c, Propagation of extracellular potassium is estimated by tracking 
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the half-maximal position of the pulse over time. Representative traces are 
shown for a single pulse selected from one of six independent experiments. 
d, Propagation of extracellular potassium is relatively constant over time in 
contrast to diffusion that is expected to decay. The diffusion line is calculated 
using the mean squared displacement (MSD) and the diffusion coefficient 
for potassium in biofilms (Supplementary Information). Slopes are calculated 
from the same representative data shown in c. 
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Extended Data Figure 4 | External potassium affects the metabolic state of 
the cell. a, A potassium shock (300 mM KC]) produces an initial ThT decrease 
(depolarization) followed by a period of sustained ThT increase (hyper- 
polarization). ThT is inversely related to the membrane potential. 

A corresponding pulse in APG-4 during this ThT increase suggests that 
hyperpolarization is due to release of potassium. APG-4 signal due to the 
external potassium shock itself was subtracted using the cell-free background 
near the biofilm. A representative trace was selected from three independent 


experiments. b, ThT spikes in response to external potassium shock (300 mM 
KCl) but not an equivalent shock of 300 mM sorbitol, an uncharged solute. 
A representative trace was selected from three independent experiments. c, The 
hyperpolarization response occurs when cells are grown in glutamate but not 
when glutamate is replaced by 0.2% glutamine, which bypasses the need to 
take up glutamate or retain ammonium. A representative trace was selected 
from four independent biofilms. 
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Extended Data Table 1 | List of strains used in this study 


Strain Genotype Source 
Wild type B. subtilis NCIB 3610 1 
AyugO yugO:: neo This study 
yugOAtrkA trkA:: neo This study 
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Extended Data Table 2 | Parameter values used in the model 
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Architecture of the mammalian 
mechanosensitive Piezol channel 


Jingpeng Ge'?*, Wanqiu Li**, Qiancheng Zhao'**, Ningning Li?*, Maofei Chen"?, Peng Zhi®, Ruochong Li”, Ning Gao’, 


Bailong Xiao'*4 & Maojun Yang"? 


Piezo proteins are evolutionarily conserved and functionally diverse mechanosensitive cation channels. However, the 
overall structural architecture and gating mechanisms of Piezo channels have remained unknown. Here we determine 
the cryo-electron microscopy structure of the full-length (2,547 amino acids) mouse Piezol (Piezol) at a resolution of 
4.8 A. Piezol forms a trimeric propeller-like structure (about 900 kilodalton), with the extracellular domains resembling 
three distal blades and a central cap. The transmembrane region has 14 apparently resolved segments per subunit. These 
segments form three peripheral wings and a central pore module that encloses a potential ion-conducting pore. 
The rather flexible extracellular blade domains are connected to the central intracellular domain by three long 
beam-like structures. This trimeric architecture suggests that Piezol may use its peripheral regions as force sensors 


to gate the central ion-conducting pore. 


Mechanosensitive cation channels have key roles in converting mech- 
anical stimuli into various biological activities, such as touch, hearing 
and blood pressure regulation, through a process termed mechano- 
transduction’. Piezo proteins have recently been identified as pore- 
forming subunits of the long-sought-after mechanosensitive cation 
channels in metazoans”®. A single fly Piezo gene has been shown to be 
involved in mechanical nociception®. There are two Piezo proteins in 
vertebrates: Piezol and Piezo2. In vertebrates, including fish’, birds’®, 
rodents'~"* and humans”, Piezo2 mediates gentle touch sensation. By 
contrast, Piezol has broad roles in multiple physiological processes, 
including sensing shear stress of blood flow for proper blood vessel 
development'*””, regulating red blood cell function’*”” and control- 
ling cell migration and differentiation”*'. In humans, mutations of 
PIEZO1 or PIEZO2 have been linked to several genetic diseases, 
including dehydrated hereditary stomatocytosis*””, distal arthrogry- 
posis type 5 (ref. 28), Gordon syndrome and Marden-Walker syn- 
drome”. These findings demonstrate the functional importance of 
Piezo channels, as well as their pathological relevance and potential 
as therapeutic targets. 

Despite the functional importance of Piezo proteins, their gating 
mechanisms and three-dimensional (3D) structures are yet to be 
defined. They do not bear notable sequence and structural homology 
to any known classes of ion channel, such as voltage- or ligand-gated 
channels*’, transient receptor potential (TRP) channels**, prokar- 
yotic mechanosensitive channels**** or eukaryotic mechanosensitive 
two-pore-domain potassium channels. Mammalian Piezo proteins 
contain more than 2,500 residues with numerous predicted trans- 
membrane segments**’”*° and form homo-oligomerized channel 
complexes’. However, the exact stoichiometry, topology, architecture 
and functional domains involved in pore formation, force sensing and 
regulation remain to be solved. 

Combining protein engineering, X-ray crystallography, single- 
particle cryo-electron microscopy and live-cell immunostaining, we 
have obtained the medium-resolution structure of the full-length 
Piezol channel. Our results provide key insights into the ion-conducting 


and gating mechanisms of this novel class of mechanosensitive ion 
channels. 


Piezol forms a homotrimer 


Our initial effort was focused on obtaining a sufficient amount of 
acceptably homogenous Piezo proteins. Human, mouse and 
Drosophila Piezo complementary DNAs, in full-length or truncated 
forms, were cloned into a vector encoding a carboxy-terminal 
(C-terminal) glutathione S-transferase (GST) tag with a precision 
protease cleavage site in between (Piezol-pp-GST). Constructs were 
tested for their expression using transient transfection in HEK293T 
cells. A large number of detergents in various classes were screened for 
their compatibility with the extraction and purification of Piezo pro- 
teins. Finally, a combination of mouse Piezol with the detergent 
C12E10 was used for purification and structural determination. 

Gel filtration chromatography showed that Piezol-pp-GST and 
Piezol without the GST tag both contained two forms of oligomer, but 
at different ratios (Fig. la~c and Extended Data Fig. 1). On native gels, 
Piezol-pp-GST migrated as a major band at a molecular weight of 
about 1,200 kDa and a minor one at about 900 kDa (Fig. 1c). This 
result seemed consistent with a previous study, which suggested that 
Piezol fused to GST formed a homotetramer*. However, examination 
of Piezol-pp-GST proteins by negative-staining electron microscopy 
showed an ostensibly dimeric arrangement of particles (Fig. 1d, e). 
Two-dimensional (2D) classification of these particles indicated that 
the two halves were highly similar (Fig. 1f), suggesting that the dimer- 
ized GST tag may mediate further dimerization of Piezol complexes. 
Consistent with this possibility, Piezol with the GST tag cleaved dis- 
played mainly a molecular weight of 900 kDa on native gels (Fig. 1c). 
Moreover, almost no particles with the dimeric arrangement could be 
observed in the tag-free Piezol sample. Rather, particles with a three- 
fold symmetry were clearly detected (Fig. 1g-i). As a further con- 
firmation, Flag-tagged Piezol displayed a major band at about 
900 kDa on native gels (Fig. 1c). Thus, our data suggest that the major 
oligomeric state of the purified Piezol is trimeric. The majority of 
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Piezol-pp-GST fusion proteins form a dimer of trimers, as a result of 
the dimerized GST tags. 

The unusual migration of the 1,900-kDa Piezol-pp-GST dimer of 
trimers near the 1,200-kDa marker might have led to the incorrect 
characterization of Piezol-pp-GST as a tetramer in the previous 
report®. The large native size of the protein, together with its numer- 
ous transmembrane segments, might have resulted in its unusual 
mobility on native gels owing to the influence of the detergents. 
Nevertheless, we could not completely exclude the possibility that 
Piezol exists in other oligomeric states on the membrane or under 
different conditions in vitro, a scenario observed in previous studies of 
other ion channels (for example, Orai channels)*!”. 


Three-blade, propeller-shaped Piezol homotrimer 


Using a single-particle approach during cryo-electron microscopy, we 
determined the trimeric structure of Piezol (Fig. 2a-d and Extended 
Data Figs 2-5). Notably, the density map revealed that Piezol formed 
a three-blade, propeller-shaped architecture, with distinct regions 
resembling the typical structural components of a propeller, including 
three blades and a central cap. Viewed from the top, the diameter and 
the axial height of the structure are 200A and 155A, respectively 
(Fig. 2d). The transmembrane region could be readily located and 
contains many paired density rods, in good agreement with the 2D 
analyses (Fig. 2c-f). The transmembrane region contains three 
extended and twisted arrays of transmembrane helices (Fig. 2f, second 
from left). Beyond the transmembrane helical array, three thick distal 
blades are arranged in a superhelical fashion and each blade also has a 
helicoidal surface (Fig. 2d, e and f, second from right). A single central 
cap sits above the surface of the transmembrane core with a gap 
(~8 A) in between (Fig. 2e). Furthermore, a tightly packed region, 
likely to be a compact soluble domain, is located on the opposite side 
of the cap, right below the transmembrane region (Fig. 2e). Three 
long, distinct density rods exposed on the outer surface of the trans- 
membrane region, hereafter termed beam, seem to connect the distal 


Dimer 
of trimer 
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Figure 1 | Piezol forms a homotrimer. a, A 
representative trace of gel filtration of the full- 
length Piezol, with molecular weight markers 
indicated. UV, ultraviolet. b, Protein samples of the 
indicated fractions were subjected to SDS-PAGE 
and Coomassie blue staining. c, Native gel and 
western blotting analysis of GST-cleaved 

Piezol (PPase), Piezol-pp-GST (GSH) and 
Piezol-Flag (Flag) samples with an anti-Piezol 
antibody. d, A representative micrograph of the 
negatively stained Piezol-pp-GST. e, Raw 
particles of Piezol-pp-GST. f, 2D class averages of 
Piezol-pp-GST particles. g, A representative 
micrograph of the negatively stained Piezol. 

h, Raw particles of Piezol. i, 2D class averages of 
Piezol particles. 


end of the transmembrane region and the blades mechanically to the 
centre of the trimeric complex at the bottom face. The diameter of the 
density rod suggests that the beam is composed of a two-stranded 
coiled coil (Fig. 2d, e). 


Topology determination 


The proposed detachment of the cap from the transmembrane core 
indicates that it is likely to be a soluble region. A topological predic- 
tion model suggests that residues from 2210 to 2457 (termed the 
C-terminal extracellular domain, CED) constitute a large extracellular 
loop followed by the last transmembrane segment at the C terminus”. 
To test whether this region constitutes the cap, we constructed and 
purified the deletion-mutant Piezol(A2219-2453) and examined it 
by negative-staining electron microscopy. 2D classification of 
Piezo1(A2219-2453) particles revealed the central cap was absent 
in 2D class averages (Extended Data Fig. 6a, b), confirming that this 
region indeed forms the cap. 

Next, we solved the crystal structure of the CED (Piezol(2214- 
2457)) (Fig. 2g and Extended Data Table 1), which was similar to that 
of the same region of Caenorhabditis elegans Piezo reported 
recently. The root-mean-square deviation of 181 aligned «-carbon 
atoms between these two structures is 1.7 A (Extended Data Fig. 6c, d). 
The amino (N) and C termini of the CED are on the same side 
and close to each other (Fig. 2g), consistent with the topological 
prediction**** that the CED is located between the last two transmem- 
brane segments in the C-terminal region of Piezol. 

The CED formed a trimer in both gel filtration and crystal lattice 
(Extended Data Fig. 6d, e). A direct and rigid fitting of the crystal- 
lographic trimer of the CED into the cryo-electron microscopy den- 
sity map resulted in a match, with a correlation coefficient of 0.89 
(Extended Data Fig. 6f). These results demonstrate that the cap is 
formed by a CED trimer, further supporting the conclusion that 
the full-length Piezol forms a homotrimer. Furthermore, the high 
consistency between the crystal structure and the cryo-electron 
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Figure 2 | Overall structure of Piezol. a, A 
representative cryo-electron micrograph of Piezol. 
b, Power spectrum of the micrograph in a, with 
the 3-A frequency indicated. c, Representative 2D 
class averages of Piezol particles, showing fine 
features of the trimeric complex. d, Top, bottom 
and side views of an unsharpened map (5c contour 
level) of Piezol, with distinct regions labelled. 
The dimensions of the trimeric structure is shown 
in the rightmost panel. e, Side view of the 
sharpened map (6c contour level) of Piezol filtered 
to a resolution of 4.8 A, with the transmembrane 
region indicated. f, Selected z-slices of the final 
sharpened map corresponding to the layers 
indicated by the numbered arrows in e. g, The 
cartoon model of the crystal structure of a single 
C-terminal extracellular domain. The dashed 

line indicates the missing residues. The Flag tag 
was inserted after residue A2419. h, Immuno- 
staining of cells transfected with the indicated 
constructs with an anti-Flag antibody either in 
live labelling (top row) or after fixation and 
permeabilization (bottom row). Scale bars, 10 jum. 
GFP, green fluorescent protein; IRES, internal 
ribozyme entry site. 
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microscopy map of the cap domain confirmed the correctness of the 
density map and determined the handedness of the map. 

To further confirm the topological location of the CED and the 
C terminus of Piezol, we performed immunolabelling of live 
HEK293T cells expressing Piezol with a Flag tag fused either in a 
flexible loop of the CED (after A2419) or at the C terminus of Piezol. 
Using confocal microscopy, we found that the Flag tag could be 
labelled on the plasma membrane of live cells only when inserted in 
the CED and not at the C terminus (Fig. 2h). These data demonstrate 
that the CED is an extracellular domain, whereas the C terminus is 
intracellular, consistent with a recent report*®. Consequently, this 
suggests that both the central cap and the three blades locate at the 
extracellular side, whereas the beams locate at the intracellular side. 


The transmembrane skeleton 


Piezo proteins have been predicted to contain an unusually large num- 
ber of transmembrane segments (about 30-40) in one molecule**”””. 
Several potential topology models of Piezo have recently been pro- 
posed, with the number of transmembrane segments ranging from 
10 to 38 (ref. 40). The local resolution of the cryo-electron microscopy 
density map shows that the transmembrane region is associated with a 
higher resolution, which allowed us to build a de novo alanine model 
with 492 amino acids for the more readily identified transmembrane 
segments, beam and the intracellular C-terminal domain (CTD). 
Together with the 227 amino acids of the CED, we built a total of 
719 residues (out of 2,547 amino acids) for each monomer (Fig. 3a 
and Extended Data Figs 7, 8). The whole transmembrane skeleton 
displays a three-winged arrangement, with each extended wing being 
slightly twisted (Fig. 3b). From the map, 14 transmembrane segments 
could be readily recognized on each wing. A potential topology of at 
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least 14 transmembrane segments for each protomer is consistent with 
a recent topology model of 18 transmembrane segments, instead of 38 
transmembrane segments”. In line with this observation, a single blade 
has a volume comparable to the cap region, which is made up of about 
700 residues. Thus, some of the predicted N-terminal helices should 
reside in the distal extracellular regions. 

To facilitate the description of our structure and based on known 
features of ion channels, we refer to the core transmembrane segments 
as inner helix (IH) and outer helix (OH) and to the peripheral trans- 
membrane arrays as peripheral helix (PH) (Fig. 3). The 12 PHs from 
the same monomer are organized as six helical pairs, extending from 
the central axis to the periphery of the complex (Fig. 3b). They are 
connected to the extracellular blade. The density for the connecting 
sequences from PH1 to PH7 allowed us to make tentative connections 
between them, except for the connection between PH4 and PH5 
(Extended Data Fig. 8a). 

Main-chain tracing of the PH1, IH and OH towards the transmem- 
brane core in the density map, together with the information from 
topology (Fig. 3c) and secondary structure prediction (Extended Data 
Fig. 9), allowed us to map these three transmembrane segments on the 
primary sequence and assign some of the linker sequences between 
them into the corresponding density features. These analyses suggest 
that the OH connects to PH1 through four continuous «&-helices, 
which form a unique hairpin structure at the interface of two adjacent 
subunits. This hairpin structure, termed the anchor, penetrates into 
the inner leaflet of the membrane, with a long helix (4°) roughly 
parallel to the membrane (Fig. 3a, right and Extended Data Fig. 8b). 
The remaining density features in the map include the IH and 
its connecting density (also four o-helices) all the way to the intra- 
cellular surface of the channel, suggesting that the IH is the last 
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Figure 3 | Organization of the transmembrane skeleton. a, A side view of 
the cryo-electron microscopy density map superimposed with separately 
coloured poly-alanine models of each subunit. The boxed region is enlarged 
to illustrate the anchor domain. b, A z-slice representation of the overall 
organization of the transmembrane skeleton of the layer indicated by the 
blue dashed line in a. The boxed region is amplified to illustrate the central 
transmembrane core that consists of three [Hs and three OHs and wings of the 
peripheral helices (PH1-PH12). Owing to the ambiguity in the connection, 
the three IHs are not assigned to each subunit and thus labelled as IH, IH’ and 
IH”. c, The model represents the topology of the C-terminal part of Piezol. 
Different structural units are indicated. 


transmembrane segment from the C terminus. In line with this 
assignment, the intracellular C terminus is located at the centre of 
the intracellular side, as indicated by the location of the C-terminal 
GST tag in Piezol-pp-GST. 

Together with the finding that the CED is inserted between the last 
two transmembrane segments from the C terminus, the OH is likely to 
be the second-to-last transmembrane segment from the C terminus, 
because of the close distance (matching the length of the linker 
sequences) between the N terminus of the CED and the extracellular 
end of the OH (Fig. 3a and Extended Data Fig. 9). In addition, the 
distance constraint enabled us to put a connection between a specific 
OH and one of the three N termini of the CED domain. However, we 
cannot unambiguously connect a specific IH to the three possible C 
termini of the CED. 

Nevertheless, with the primary sequence of the PH1-anchor-OH- 
CED from one monomer fixed in the density map, a clear separation 
of the three subunits on the 3D structure could be achieved (Fig. 3a). 
The presence of the anchor domain also seems to result in a clockwise 
swapping of the OH-CED of one monomer (viewed from the cap) 
into a region of the neighbouring monomer. This helix-swapping 
arrangement might be critical for the stabilization of the Piezol tri- 
mer. Although unambiguous sequence assignment at the residue level 
was not feasible, this anchor domain of Piezol could be mapped to 
residues around 2100 to 2190, a region containing the most evolutio- 
narily conserved sequence motif, PF(X2)E(X6)W (2129-2140), 
among Piezo homologues (Extended Data Fig. 9)**. The disease-caus- 
ing mutation Piezol(T2142) (T2127 of PIEZO1 in humans)” is 
located in this region, supporting the functional relevance of the 
anchor. Another mutation targeting this motif, Piezol(E2133), was 
found to affect the Piezol channel pore properties”. 

Each wing of the transmembrane region sits on a coiled-coil beam 
exposed at the intracellular surface. The beam is about 80 A in length 
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Figure 4 | Putative ion-conducting pore. a, Surface representation 
(transparent) of the segmented map of the putative pore module, including the 
OH, CED, IH and CTD. b, Same as a, but the model is superimposed with 
the putative ion-conducting pore (deep blue), produced by HOLE” with the 
poly-alanine model and the CED crystal structure. c, Central slice of the 
rotationally averaged density map, highlighting a continuous central pore along 
the z-axis (red dotted line). The extracellular vestibule (EV), transmembrane 
vestibule (MV) and intracellular vestibule (IV) regions are labelled. d, A side 
view of the CTD and the pore module consisting of the OH, IH and the 
CTD helices. e, Same as d, but viewed from the intracellular side. 


and positioned at about 30° relative to the membrane (Fig. 3a). It 
originates peripherally at the intracellular side of the PH7-PH8 pair 
and ends near the central axis of the trimer, where it seems to interact 
with the anchor and the CTD (Fig. 3a). This organization suggests that 
the three beams might be responsible for transmitting conformational 
changes from peripheral transmembrane segments and the extracel- 
lular blades to the central region, where the ion-conducting pore is 
most likely to reside. 


The ion-conducting pore 

The centre of the Piezol channel within the membrane consists of six 
transmembrane helices in a triangular arrangement (Fig. 3b, right and 
Fig. 4). Three IHs, presumably extended from the C termini of the 
CEDs, are located at the innermost position and seem to line a central 
pore. Three OHs, extended from the N termini of the CEDs, further 
enclose the three IHs (Fig. 4a). This central region, including the IH- 
OH pairs, the CEDs and the CTDs, probably comprises the pore 
module of Piezol. The lack of side-chain information in the three 
IHs prevented us from accurately determining the radius of the pore. 
Nonetheless, apparent restriction sites could be readily detected, sug- 
gesting that they are potential gating positions. The central slice of the 
rotationally averaged density map revealed a continuous central 
channel along the z-axis, including an extracellular vestibule within 
the cap, a transmembrane vestibule enclosed by the three IHs and an 
intracellular vestibule formed by the trimeric CTD (Fig. 4b-e). The 
organization of the central transmembrane core and the pore is 
reminiscent of the trimeric P2X, channels” and acid-sensing ion 
channels**, although they possess only two transmembrane helices 
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and a large extracellular domain in each monomer. Based on this 
structural information, we propose that the OH-CED-IH-CTD-con- 
taining region functions as the pore module of Piezo channels (Fig. 4). 
According to our assignment, this pore module comprises the 
C-terminal region from residues 2172 to 2547. This is consistent with 
a recent study showing that the portion from 1974 to the C terminus 
of Piezol is essential for ion permeation properties”. 


The flexible blades as potential force sensors 

The local resolution map shows that the three blades of Piezol have 
smeared densities at their distal ends and fragmented density in the 
sharpened map (Fig. 2d, e). In contrast, the cap, transmembrane 
skeleton, beam and CTD are better defined and display apparent 
secondary structural features. The blades of Piezol are highly flexible 
(Figs 2c, 3a and Extended Data Fig. 8). Indeed, comparison of differ- 
ent classes of the structures from symmetry-free 3D classification 
reveals several motion modes for the blade (Fig. 5a, b and Extended 
Data Fig. 5a). The most notable one is that the rotational spacing 
between two adjacent blades varies from 100° to 140° (Fig. 5a). 
Other less pronounced but identifiable conformational variations 
include the tilting of the blade relative to the plasma membrane and 
curvature changes on the helicoidal surface (Fig. 5b). Further support- 
ing the structural flexibility of the blade regions, subregion refinement 
(see Methods) considerably improved the densities of the cap, but not 
that of the blade. The large conformational heterogeneity in the blades 
could be the main factor hampering high-resolution structural refine- 
ment of the entire structure. However, the structural flexibility of the 
propeller-like blades could be functionally meaningful. For example, 
they might serve as sensors of mechanical force exerted on the chan- 
nel, thus contributing to mechanical gating of Piezol (Fig. 5c). 

The recently resolved cryo-electron microscopy structure of 
human TRPAI reveals a fourfold propeller-like structure composed 
of numerous ankyrin repeats*’. Although TRPA1 alone is not sufficient 
to mediate mechanosensitive currents, it has been proposed to mediate 
slowly adapting mechanically activated currents in somatosensory 
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Figure 5 | Conformational heterogeneity of the 
‘blade’ and a proposed model of force-induced 
gating of Piezo channels. a, Representative 
classes of Piezol structures from symmetry-free 3D 
classification. For each top-viewed structure, three 
black lines (120° interval) are drawn to illustrate 
the expected position of blades on the basis of 
perfect C3 symmetry. Red dashed lines represent 
observed positions of the blades. b, Structural 
comparison between further-refined maps of 
structures 4 (orange) and 3 (cyan) ina, showing the 
centripetal movement of the blades (top) and the 
tilted movement of the beams relative to the 
plasma membrane plane (bottom). c, Proposed 
model of the force-induced gating of Piezo 
channels. The blue and orange models represent 
the closed and open state channels, respectively. 
Red dashed lines indicate the possible ion- 
conduction pathways. Presumably, force-induced 
motion (red arrows) of the peripheral blade or 
PHs leads to conformational arrangement and 
gating of the channel. 
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neurons**”, raising an intriguing possibility that TRPA1 may employ 
the propeller-like structure to confer mechanosensitivity under certain 
circumstances. It remains possible that other extracellular or intracel- 
lular proteins may interact with and regulate Piezo channels. These 
hypotheses merit further investigation. 


Conclusions 


The medium-resolution cryo-electron microscopy structure of Piezol 
provides critical insights into the general architecture, oligomeriza- 
tion state and topological organization of Piezo channels. Our putat- 
ive assignment of the central ion-conducting pore, mechanosensing 
and transduction components serves as a testable framework for dis- 
section of the structure and mechanism of this class of channels. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size, the experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Molecular cloning. The pcDNA3.1-Piezol-pp (PPase, PreScission protease 
cleavage site) -GST-IRES-GFP construct was subcloned by inserting the coding 
sequence of the PreScission protease cleavage site between Piezol (E2JF22, 
UniprotKB entry) and GST coding sequences in the parental construct of 
pcDNA3.1-Piezol-GST-IRES-GFP’. Piezol-Cterm-Flag-IRES-GFP was sub- 
cloned by inserting the synthesized double-stranded DNA fragment encoding 
Flag between the Piezol-coding sequence and IRES-GFP using the restriction 
enzymes Asc] and SaclI. Piezol(A2419)-Flag-IRES-GFP was constructed using a 
one-step cloning kit (Vazyme Biotech) by introducing the Flag-tag coding 
sequence after the residue Piezol(A2419) into the Piezol-GST-IRES-GFP con- 
struct and the Piezol(A2219-2453) construct was generated by deleting amino 
acids 2219-2453 from the Piezol-pp-GST-IRES-GFP construct. The coding 
sequence of the CED of Piezol (residues 2214-2457) was cloned into a pET22b 
(Novagen) vector with a C-terminal 6XHis tag using the restriction enzymes 
Ndel and Xhol. All the constructs were confirmed by sequencing. 

Protein expression and purification of Piezol and Piezol(A2219-2453). 
HEK293T cells were grown in DMEM (basic) with 10% FBS. When the density 
of cells cultured in 150mm X 25mm dishes reached 80-90%, the expression 
plasmids were transiently transfected with polyethylenimines (Polysciences). 
The protein purification procedure was slightly modified from similar previously 
described methods’. After 48 h, the transfected cells were collected, washed twice 
with PBS and homogenized in buffer A, containing 25 mM Na-PIPES, pH 7.2, 
140 mM NaCl, 2 mM dithiothreitol (DTT), detergents CHAPS (1%) and C12E9 
(0.1%), 0.5% (w/v) L-o-phosphatidylcholine (Avanti) and a cocktail of protease 
inhibitors (Roche) at 4 °C for 1 h. After centrifugation at 100,000g for 40 min, the 
supernatant was collected and incubated with glutathione-sepharose beads (GE 
Healthcare) at 4°C for 3h. The resin was washed extensively with buffer B, 
containing 25mM Na-PIPES, pH 7.2, 140mM NaCl, 2mM DTT, 0.1% (w/v) 
C12E9 and 0.01% (w/v) L-o-phosphatidylcholine. The GST-free or GST-tagged 
Piezol was cleaved off by PreScission Protease (Amersham-GE) in buffer B at 
4 °C overnight or directly eluted from the protein-loaded resin with buffer B plus 
10mM GSH, respectively, and applied to size-exclusion chromatography 
(Superpose-6 10/300 GL, GE Healthcare) in buffer C (25mM Na-PIPES, pH 
7.2, 140 mM NaCl, 2mM DTT) plus 0.026% (w/v) C12E10 or other detergents 
in the final concentration of 2X critical micelle concentration. For amphipol- 
bound Piezol, amphipols were substituted for detergents as described”, after 
which the protein was loaded on a Superpose-6 column in buffer C. Proteins 
with different kinds of detergents or amphipols were examined by both gel 
filtration and negative staining. Peak fractions representing oligomeric Piezol 
were collected for electron microscopy analysis. Protein in C12E10 was used for 
final cryo-electron microscopy structure determination. All detergents and 
amphipols used in this project were purchased from Anatrace. 

Expression and purification of Piezol CED fragment. Overexpression of 
Piezol CED was induced in Escherichia coli BL21 strain by 0.5 mM isopropyl- 
B-p-thiogalactoside when the cell density reached an optical density of ~0.8 at 
600 nm. After growing at 18°C for 12h, the cells were collected, washed, resus- 
pended in buffer D, containing 25mM Tris-HCl, pH 8.0, 500mM NaCl and 
20 mM imidazole, and lysed by sonication. The lysates were clarified by centrifu- 
gation at 23,000g for 1 h and the supernatant was collected and loaded onto Ni?" - 
nitrilotriacetate affinity resin (Ni-NTA, Qiagen). The resin was washed exten- 
sively with buffer D and eluted with buffer D plus 280 mM imidazole. The eluate 
was concentrated and subjected to gel filtration (Superdex-200, GE Healthcare) 
with buffer E, containing 25 mM Tris-HCl, pH 8.0, 200 mM NaCl, 2 mM DTT, or 
buffer F, containing 25mM Tris-HCl, pH 8.0, 25mM NaCl and 2mM DTT 
(Extended Data Fig. 6e). 

NativePAGE Novex Bis-Tris gel and western blotting. The purified Piezol 
proteins were subjected to 3-12% NativePAGE Novex Bis-Tris gel for native 
electrophoresis according to the manufacturer’s protocol at 150 V for 2h. The 
native gel was transferred to a positively charged nylon/nitrocellulose membrane 
at 100 V for 1.5h. After incubating in 8% (v/v) acetic acid to fix the proteins, air 
drying and rewetting with methanol, the membrane was blocked with 5% (w/v) 
milk in TBS buffer with 0.1% (w/v) Tween-20 (TBST buffer) at room temperature 
(~26 °C) for 1h. The membrane was then incubated with the anti-Piezol anti- 
body (1:1,000) (custom generated using the peptide YIRAPNGPEANPVK) at 
room temperature for 1h, followed by washing with TBST buffer and further 
incubated with anti-rabbit IgG antibody (1:10,000) at room temperature for 1h. 
Proteins were detected with the SuperSignal West Pico Chemiluminescent 
Substrate (Thermo). 


Immunostaining. For live-cell labelling, cells grown on coverslips were incu- 
bated with the anti-Flag antibody (1:100, Sigma) diluted in prewarmed culture 
medium at room temperature for 1h. After three washes, cells were incubated 
with the Alexa Fluor 594 donkey-anti-mouse IgG secondary antibody (1:200, Life 
Technologies) at room temperature for 1 h and then washed and fixed with 4% 
(w/v) paraformaldehyde. For permeabilized staining, cells were first fixed with 4% 
(w/v) paraformaldehyde and permeabilized with 0.2% (w/v) Triton X-100, then 
incubated with the anti-Flag antibody (1:200, Sigma) or the anti-GST antibody 
(1:200, Millipore) at room temperature for 1 h. Cells were washed and then 
incubated with the Alexa Fluor 594 donkey-anti-mouse IgG (1:200, Life Techno- 
logies) or Alexa Fluor 594 donkey-anti-rabbit IgG (1:200, Life Technologies) 
secondary antibody at room temperature for 1 h. After washing, coverslips were 
mounted and imaged using a Nikon Al confocal microscope with a 60% oil 
objective (N.A. = 1.49) at either the GFP (488-nm exciting wavelength) or the 
TRITC channel (561-nm exciting wavelength). 

Crystallization, data collection and structure determination of the CED. 
Crystals of CED proteins were obtained at 18 °C using the sitting-drop method 
by mixing 1 yl protein (15 mg ml’) with 1 pl reservoir solution (0.1 M HEPES, 
pH 7.5, 0.2 M MgCl, and 25% w/v PEG3350). Crystals appeared after 2-3 weeks 
and reached full size in about a month. The crystals were cryo-protected in 
reservoir solution containing 15-20% glycerol and flash frozen in liquid nitrogen 
before data collection. Native data of CED crystals were collected at beamline 
BL17U of the Shanghai Synchrotron Radiation Facility (SSRF). Single-wave- 
length anomalous dispersion data were collected at 100 K using a MAResearch 
M165 charge-coupled device (CCD) detector at the Beijing Synchrotron 
Radiation Facility (BSRF), with the crystals soaked in 2 M Nal for 1 min. All 
diffraction data were processed with HKL2000 (ref. 49). Further processing was 
carried out using programs from the CCP4 suite (Collaborative Computational 
Project)°°. The heavy-atom positions in the iodine-soaked crystal were deter- 
mined using SHELXD*'. Heavy-atom parameters were then refined and initial 
phases were generated in the program PHASER” using the single-wavelength 
anomalous dispersion experimental phasing module. The real-space constraints 
were applied to the electron density map in DM*. The resulting map was of 
sufficient quality for building the model of the CED in Coot™. The structures 
were refined with the PHENIX packages*. Full data collection and structure 
statistics are summarized in Extended Data Table 1. 

Negative-staining electron microscopy. An aliquot of 4 ll Piezol (0.05 mg ml ') 
was applied to glow-discharged carbon-coated copper grids (200 mesh, 
Zhongjingkeyi, Beijing). After the grids were incubated at room temperature 
for 1 min, excessive liquid was absorbed by filter paper. Grids containing the 
specimen were stained by applying droplets of 2% uranyl acetate for 30 s and air 
dried. Micrographs were generated on a T12 microscope (FEI) operated at 
120kV, using a 4k xX 4k CCD camera (UltraScan 4000, Gatan). Images of 
Piezol purified with C12E10, C12E8 and amphipol A8-35 were recorded at a 
nominal magnification of 68,000 and with a pixel size of 1.59 A (Extended Data 
Fig. 2). Images of Piezol(ACED) in C12E10 were recorded at a nominal mag- 
nification of 49,000X and with a pixel size of 2.21 A. Micrographs of random 
conical tilt (RCT) pairs were taken at 50° and 0° tilt angles at a nominal mag- 
nification of 49,000. 

Cryo-electron microscopy. The detergent C12E10 was chosen for cryo-electron 
microscopy analysis because it produced slightly better micrographs (Extended 
Data Fig. 2). Aliquots of 4 jl detergent-solubilized (C12E10) Piezol at a concen- 
tration of 0.2 mg ml’ were applied to glow-discharged 300-mesh Quantifoil R2/2 
grids (Quantifoil, Micro Tools GmbH, Germany) coated with a self-made con- 
tinuous thin carbon. After 15 s of waiting time, grids were blotted for 1.5s and 
plunged into liquid ethane using an FEI Mark IV Vitrobot operated at 4°C and 
100% humidity. Grids were examined using a TF20 microscope (FEI) operated at 
200kV with a nominal magnification of 62,000 and images were captured on a 
CCD camera (Gatan) under low-dose conditions. High-resolution images were 
captured on a Titan Krios microscope, operated at 300 kV, with a K2 Summit 
direct electron detector (Gatan) in counting mode. Data acquisition was per- 
formed using UCSF-Image4 (X. Li and Y. Cheng), with a nominal magnification 
of 22,500, which yields a final pixel size of 1.32.A at object scale and with 
defocus ranging from -1.7 jim to -2.9 1m. The dose rate on the detector was 
about 8.2 counts per pixel per second, with a total exposure time of 8s. Each 
micrograph stack consists of 32 frames. 

Image processing. The data sets of negative-staining electron microscopy were 
processed with EMAN2.1 (ref. 56) and RELION”. Reference-free 2D classifica- 
tion was performed with RELION. The numbers of Piezol particles in the pres- 
ence of C12E10, C12E8 and amphipol A8-35 are 7,279, 14,045 and 7,565, 
respectively. For RCT™ data processing, particle picking and classification were 
performed with EMAN2.1 (ref. 56) and reconstruction of RCT classes and struc- 
tural refinement from all untilted particles were performed with SPIDER”. The 
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final number of particles used in generating the initial model is 5,670. The initial 
3D reference created using the RCT method is shown in Extended Data Fig. 3. 

For cryo-electron microscopy (TF20) data processing, 505 micrographs were 
processed with SPIDER” and RELION”. Particles were picked using SPIDER, 
manually screened (39,555 in total) and subjected to reference-free 2D classifica- 
tion using RELION. A final number of 16,729 particles were used for 3D refine- 
ment using the RCT model as initial reference. To validate the 3D model, 3D 
refinement was also performed with a Gaussian density ball as initial reference. 
During refinement, both the symmetry-free (C1) and symmetry-imposed (C3) 
reconstructions were tested (Extended Data Fig. 3d). 

For processing K2 micrographs, motion correction was applied at the micro- 
graph level using the dosefgpu_driftcorr program (developed by X. Li) to produce 
average micrographs across all frames®. Micrograph screening, particle picking 
and normalization were performed with SPIDER. The program CTFFIND3 (ref. 
61) was used to estimate the contrast transfer function parameters. The 2D and 
3D classification and refinement were performed with RELION exclusively to 
avoid potential structural overfitting. Classification of raw cryo-electron micro- 
scopy particles resulted in well-resolved 2D class averages, with many secondary 
structural features clearly discernable. In particular, on class averages of typical 
side views, many pieces of rod-like densities arranged in parallel fashion could be 
readily identified, raising the possibility that they were transmembrane helices 
(Fig. 2c). A total of 179,805 particles from 1,042 micrographs were subject to a 
cascade of 2D and 3D classification (Extended Data Fig. 5a). During 3D clas- 
sification, no symmetry was imposed. Different combinations of particles from 
these classes were tested in refinement. After two rounds of 3D classification, a set 
of adequately homogeneous particles (30,021), which best matched the C3 sym- 
metry, was subjected to a third round of 3D classification. This resulted in gen- 
erally similar class structures, with no detectable improvement on particle 
homogeneity. Consequently, this set of particles was used for final refinement, 
with the RCT model low-pass filtered to 60 A as initial reference. Applying the C3 
symmetry in the refinement resulted in an overall structure at a resolution of 
10.24 A. After the first refinement, we noted that translation parameters of part- 
icles (OriginX and OriginY in RELION) were rather large, with many particles 
having x or y shifts of more than 15 pixels. Particles were rewindowed from 
original micrographs by applying their x and y shifts. Rewindowed particles were 
subjected to a second round of refinement using RELION, which only marginally 
improved the density map. A third round of refinement was performed by apply- 
ing an enlarged soft mask (Extended Data Fig. 5a) of the Piezol channel, which 
improved the overall resolution to 6.03 A. Last, particle-based beam-induced 
movement correction was performed by statistical movie processing in 
RELION, using movie frames 2-15. This yielded a final 3D density map with 
an overall resolution of 5.9 A, with regions defined by the soft mask being 4.8 A 
(Extended Data Fig. 5b). All reported resolutions are based on the gold-standard 
FSC = 0.143 (ref. 62) and the final FSC curve (4.8 A) was corrected for the 
effect of a soft mask using high-resolution noise substitution®. In addition, sub- 
region refinements, as previously described for ribosomal complex structural 
determination®*®’, were applied to improve the local densities of interest, by 
using a soft mask of the cap domain, the lower central pore region and a single 
subunit. The subsequent reported resolutions were still in the range of 4.8-5.5 A, 
but with much-improved densities for these masked regions. This led to a sepa- 
ration of secondary structural elements in the cap and transmembrane regions. 
However, in all cases, the densities at the distal ‘blade’ domain are fragmented and 
limited our further quantitative analysis. Final density maps were sharpened by a 
B-factor of -100 A? using RELION. A local resolution map was calculated using 
ResMap®. UCSF Chimera® was used to fit the crystal structure of the CED to the 
density map of the cap domain. 

Poly-alanine model and structural analysis. Main-chain tracing and building a 
poly-alanine model were done manually using Coot”. Sequence alignment was 
performed using Clustal W2 (ref. 71). Secondary structures were predicted with 
PredictProtein” using the full-length Piezol sequence. Transmembrane seg- 
ments were predicted using multiple prediction web servers, including 
Topcons”, TMHMM2 (ref. 74), HMMTOP” and Phobius”, with their results 
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shown as green, blue, orange and pink lines, respectively, in Extended Data Fig. 9. 
Sequence alignment and secondary structure prediction of Piezol from different 
species were used to aid the assignment of structural elements in the density 
map. Multiple rounds of model rebuilding in Coot were performed for model 
optimization. 
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Extended Data Figure 1 | Biochemical characterization of the recombinant __ indicated fractions were subjected to SDS-PAGE and Coomassie blue staining. 
protein of Piezol-pp-GST. a, A representative trace of gel filtration Fractions of 8.0 ml and 8.5 ml (elution volume) were used for the negative- 
chromatography of the Piezol-pp-GST protein. b, Protein samples of the staining electron microscopy and native gel analyses, respectively. 
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Extended Data Figure 2 | Negative-staining electron microscopy negatively stained Piezol purified with C12E8. d, 2D class averages of Piezol 
examination of Piezol in different detergents. a, A representative particles (C12E8). e, A representative micrograph of negatively stained Piezol, 
micrograph of negatively stained Piezol purified with C12E10. b, 2D class with amphipol A8-35 as detergent. f, 2D class averages of Piezol particles 
averages of Piezol particles (C12E10). c, A representative micrograph of (amphipol A8-35). 
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Extended Data Figure 3 | Initial model of Piezol generated from the 
random conical tilt method and validation of the model using cryo-electron 
microscopy data from a TF20 microscope. a, b, Representative micrographs 
of negatively stained Piezol in C12E10 collected in random conical tilt (RCT) 
pairs (a, untilted and b, 50° tilted). c, Top view of an RCT reconstruction, 
showing an overall threefold symmetry for the Piezol complex, is shown on the 
left. The right-hand side shows the top view of the refined model, obtained by a 
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structural refinement of all particles from untilted micrographs. d, e, Model 
validation was performed by refinement of cryo-electron microscopy particles 
collected with TF20, with a Gaussian ball (d) or the RCT model (e) as initial 
reference. The 3D volumes are shown in top, side and bottom views. During the 
refinement, both the symmetry-free (C1) and symmetry-imposed (C3) 
reconstructions were tested. Note that some of these reconstructions have 
incorrect handedness. 
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Extended Data Figure 4 | Representative raw particles of Piezol collected with the Titan Krios electron microscope fitted with a K2 electron detector. 
A collection of raw particles of Piezol (eluted with C12E10), collected with Titan Krios (300 kV). 
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Extended Data Figure 5 | Workflow of 3D classification of Piezol particles. 
a, The schematic diagram of a series of 3D classification procedures with 
RELION is shown (also see Methods). After several rounds of 2D classification, 
the remaining 120,000 particles were subjected to three rounds of 3D 
classification without imposing any symmetry. A final set of particles (class 4 
after the second round of 3D classification), with its reconstruction best 
matching threefold symmetry, was subjected to 3D refinement (C3 imposed). 
Notably, further 3D classification of this class resulted in generally similar 


structures (vertically arranged panels) without detectable improvement of 
conformational homogeneity. A top view of the soft mask used in structural 
refinement is also shown (yellow). b, Distribution of particle orientations in 
the last iteration of the refinement. c, Gold-standard Fourier shell correlation 
(FSC) curves of the final density map. The FSC curves were calculated with 
(red) or without (blue) the application of a soft mask to the two half-set maps. 
The final FSC curve (red) was corrected for the soft-mask-induced effect. 
Reported resolutions were based on FSC = 0.143 criteria. 
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Extended Data Figure 6 | The trimeric CEDs form the cap domain of the trimeric CED of Piezol from M. musculus and C. elegans. The three CEDs 


Piezol. a, A representative micrograph of negatively stained Piezol(ACED) in are coloured in purple, cyan and green, respectively. The CED of C. elegans is 
C12E10. b, 2D class averages of negatively stained Piezol(ACED) particles. Itis coloured in orange. e, A representative trace of gel filtration of the CED of 
evident that the central cap domain is absent from these average images. Piezol. The molecular weights are labelled. Protein samples of the indicated 
c, Sequence alignment of the CED region of Piezol from Mus musculus and _ fractions were subjected to SDS-PAGE and Coomassie blue staining (bottom). 
Caenorhabditis elegans. Identical residues are highlighted in blue. Secondary _f, Transparent surface representation of the segmented density map of the 
structures are indicated by cartoons above the primary sequence. Sequence cap, superimposed with the trimeric CED crystal structure. The trimeric CEDs 
alignment was performed using Clustal W2 (ref. 71). d, Structure alignment of are coloured as in d. 
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Extended Data Figure 7 | Local resolution map of the final density map. density map (transparent) is superimposed with a poly-alanine model and the 
a, The final 3D density map of Piezol is coloured according to the local crystal structure of the trimeric CED. Three protomers are coloured cyan, 
resolutions estimated by the software of ResMap. The density map is shown in _ purple and green, respectively. 

three different views (top, bottom and side, respectively). b, The final 3D 
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Extended Data Figure 8 | Density connections between the transmembrane 
helices and between the helices in the compact CTD. a, Alanine models of five 
representative pairs of transmembrane helices are displayed with their densities 
(mesh) superimposed. The transmembrane region is highlighted by a light 
purple shade with the intracellular and extracellular sides indicated. b, An 
alanine model of the anchor motif with its density superimposed (mesh). Four 


fy \ oy sae aC. 
Mi ob or Intracellular 


ARTICLE 


~» Extracellular 


helices (a1*"b°q.427ch°") connecting PH1 and OH are labelled. The 
transmembrane region is highlighted by a light purple shade with the 
intracellular and extracellular sides indicated. c, An alanine model of the last 
four helices (a1°'?-a4°7”) of the trimeric CTD, superimposed with the 
density of the CTD (mesh). 
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Extended Data Figure 9 | Secondary structure analyses of the C-terminal 
segments of Piezol proteins from different species. Sequence alignment of 
the C-terminal regions of Piezol from different species. The alignment was 
performed using Clustal W2 (ref. 71). The anchor motif and the CTD are 
highlighted in green and pink, respectively. For clarification, the sequences of 


CEDs were omitted and are indicated by red dashed lines. Secondary structures 
(a-helices) predicted with PredictProtein” are shown as black lines. 
Transmembrane segments were predicted using multiple web servers including 
Topcons”? (green lines), TMHMM2 (ref. 74) (blue lines), HMMTOP’ 
(orange lines) and Phobius”* (pink lines). 
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Extended Data Table 1 | Statistics of data collection and structure refinement. 


Data collection 
Diffraction beam 
Space Group 
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Values in parentheses are for the highest resolution shell. Rmerge = Ln2illn,-ln|/ZnZiln,, where |, is the mean intensity of the i observations of symmetry-related reflections of h. R = D|Fops—Featc|/ZF ops, where Feaic iS 
the calculated protein structure factor from the atomic model (Riree Was calculated with 5% of the reflections selected). I-SAD, single-wavelength anomalous dispersion of | atoms; BSRF, Beijing Synchrotron 
Radiation Facility; SSRF, Shanghai Synchrotron Radiation Facility. 
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Episodic molecular outflow in the very young 
protostellar cluster Serpens South 


Adele L. Plunkett!, Héctor G. Arce!, Diego Mardones?, Pieter van Dokkum!, Michael M. Dunham?, Manuel Fernandez-L6pez’, 


José Gallardo” & Stuartt A. Corder° 


The loss of mass from protostars, in the form of a jet or outflow, is 
a necessary counterpart to protostellar mass accretion’”. Outflow 
ejection events probably vary in their velocity and/or in the rate 
of mass loss. Such ‘episodic’ ejection events* have been observed 
during the class 0 protostellar phase (the early accretion stage)*"°, 
and continue during the subsequent class I phase that marks the 
first one million years of star formation!!"!*, Previously observed 
episodic-ejection sources were relatively isolated; however, the 
most common sites of star formation are clusters'®. Outflows link 
protostars with their environment and provide a viable source of 
the turbulence that is necessary for regulating star formation in 
clusters’, but it is not known how an accretion-driven jet or outflow 
in a clustered environment manifests itself in its earliest stage. This 
early stage is important in establishing the initial conditions for 
momentum and energy transfer to the environment as the protostar 
and cluster evolve. Here we report that an outflow from a young, 
class 0 protostar, at the hub of the very active and filamentary 
Serpens South protostellar cluster’® 1°, shows unambiguous 
episodic events. The 1*C!°O (J=2—1) emission from the protostar 
reveals 22 distinct features of outflow ejecta, the most recent having 
the highest velocity. The outflow forms bipolar lobes—one of the 
first detectable signs of star formation—which originate from the 
peak of 1-mm continuum emission. Emission from the surrounding 
C180 envelope shows kinematics consistent with rotation and an 
infall of material onto the protostar. The data suggest that episodic, 
accretion-driven outflow begins in the earliest phase of protostellar 
evolution, and that the outflow remains intact in a very clustered 
environment, probably providing efficient momentum transfer for 
driving turbulence. 

We used the Atacama Large Millimeter/sub-millimeter Array 
(ALMA) in Chile to observe the J = 2—1 emission line of carbon mon- 
oxide isotopologues (!2CO, CO and C180) near the class 0 source 
CARMA-7 (hereafter C7), in the young protostellar cluster Serpens 
South. C7 is the strongest of several millimetre-wavelength continuum 
sources that are densely packed within Serpens South, located at a 
distance of 415 parsecs (pc) from Earth”. Its relative proximity to 
Earth allows for observations with high spatial resolution; our obser- 
vations resolve features with physical sizes of greater than about 
370 astronomical units (AU). 

The !2CO emission extends north-south of C7, spanning about 80” 
(or 0.16 pc) along an axis with a position angle of roughly 4° (Fig. 1). 
The emission is clumpy, and the strongest emission features to the 
north (south) are mostly redshifted (blueshifted), relative to the sys- 
temic cloud velocity (V.) of 8km s~! (refs 20, 21). The emission fea- 
tures near the origin are only around 1-2” (about 400-800 av) wide, 
and the width increases to about 8” (roughly 3,000 av) at the widest 
point. The opening angle of the emission decreases with velocity, 
with a maximum of about 23° (at 10”, or 0.02 pc, from the source) 


at velocities of a few kilometres per second, and a minimum of about 
10° at the same distance and higher velocities. Figure 2a shows the 
position-velocity diagram, with a saw-like pattern along the extent of 
the '7CO emission; and emission features corresponding to the highest 
velocities (|Visr— V-| =~20km s~!, where Visp is the local standard 
of rest velocity) are found closest to C7 (Fig. 2a, b). The CO emission 
is optically thick—especially, according to our data, near the cloud 
velocity—and therefore it traces outflow features with velocities greater 
than a few kilometres per second with respect to V.. 
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Figure 1 | CO molecular outflow emission centered at the class 0 
protostar CARMA-7 (C7). C7 is marked by the yellow cross at right 
ascension RA = 18 h 30 min 04.1 s, declination dec. = —02° 03’ 02.6”. The 
numbers on the x axes are truncated to show seconds only, omitting hours 
and minutes for brevity. The y axes are likewise simplified. a, c, High-velocity 
blueshifted and redshifted channels, respectively. b, Low-velocity channels, 
to show the cavity surrounding collimated ejecta. Contours in a and c begin 
with 80 and increment by 40 and 80, respectively. Labels B1-B11 and 
R1-R11 indicate 22 ejecta features. The grey lines mark the 4° position angle 
of the C7 outflow lobes. The yellow ‘plus’ symbol marks a neighbouring 
protostar, CARMA-6 (ref. 21), which provides contaminating emission, 
especially for the blueshifted southern outflow lobe. 
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Figure 2 | Outflow ejecta from C7. a, Position—velocity diagram along the 
outflow axis (see Fig. 1). Points correspond to velocity maxima where we 
identified outflow knots. Northern emission features are mostly redshifted 
and are shown in red; southern emission features are mostly blueshifted and 
are shown in blue. The scale bar shows 370 au, or 0.9”. The dashed pink line 
marks the location of the protostar; the dashed green line shows the cloud 
central velocity, V., in the same units as those of Vi sr. b, Knot velocity (Vaow) 


The C'8O emission is optically thin and therefore probes deeper than 
does the ‘CO emission, to trace denser material that is closer to the 
protostar (see Fig. 3 and the channel maps in Extended Data Fig. 1). 
Together, these molecular lines and continuum (Extended Data Fig. 2) 
trace two related components of the protostellar system: the outflow 
and the envelope. Material accretes onto a protostar from an infalling 
envelope via a disk, with the envelope providing the main mass reser- 
voir for the star. While the protostar is still obscured by the surrounding 
envelope, a bipolar outflow represents one of the first observational 
signs of star formation, and it carries away mass and angular momen- 
tum from the system. 

Our observations show two molecular outflow lobes emanating 
from the C7 envelope, and we conclusively identify an outflow-driving 
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versus distance relative to C7 (in arcsecs or parsecs). Blue (red) points mark 
southern (northern) features, as in a. c, Dynamical timescales (Tayn) for 
each knot, with no correction for inclination angle. d, Histogram showing 
the number (N) of ejecta that have been emitted at the given times since the 
previous ejection (Atgyn), with 200-year bins. e, ATgyn as a function of Tayn 
for northern (red) and southern (blue) knots. Recent northern ejecta (solid 
points) are fit with a linear trend (orange line). 


source in this region. When this region was studied with lower- 
resolution CO observations”), prevalent outflow emission from sev- 
eral young sources appeared to coincide. However, the '*CO emis- 
sion traces cool (less than about100 K) swept-up outflow material 
and provides a record of the timing history of mass-loss events for 
a given source. The C7 outflow comprises cavity walls that surround 
22 knots (observed clumps of emission from a single ejection event), 
11 to the north and 11 to the south, within 24” of the source. Beyond 
this distance, we see outflow morphology that can be attributed to 
C7, but there is contaminating cloud emission to the north and an 
interfering outflow to the south (driven by a protostar southwest of 
C7; Extended Data Fig. 2). The outflow’s high collimation, and the 
presence of redshifted and blueshifted emission coinciding along the 
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Figure 3 | Protostellar envelope. a, Integrated C'*O intensity (moment 0, 
greyscale) >130, with | Visr— V-| <3km s~!. Blue (red) contours represent 
blueshifted (redshifted) channels, with 0.4 <|Visr— Vi] <3kms7}, 
beginning at 30% of peak integrated intensity (165 mJy beam~'km s“! 

and 155 mJy beam! km s~', respectively), with increments of 10% of 
peak. The blue oval represents the beam size. b, Intensity-weighted mean 


velocity (moment 1, colour scale). Black dashed contours show integrated 
intensity (greyscale in a) with 80 steps. Grey contours show 15%, 30% and 
80% of peak continuum emission (93.9 mJy beam). c, Position—velocity 
perpendicular to the outflow axis. Contours show levels of 40 of the 
position—velocity intensity. Spatial and velocity resolution elements are 
shown with grey and black (solid) bars, respectively. 


5 NOVEMBER 2015 | VOL 527 | NATURE | 71 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


line of sight near the protostar north and south, are consistent with 
the main outflow axis being oriented nearly in the plane of the sky. 
Low-velocity redshifted and blueshifted emissions to the south and 
north are contributed by cavities surrounding the high-velocity jet- 
like emission, and the jet may precess slightly, given the slight wiggle 
in the knots shown in Fig. 1a, c. 

Clumpy '’CO emission suggests an episodic ejection mechanism, 
rather than a smooth outflow. Decreasing knot velocities with distance 
from C7 are consistent with the existence of jet-entrained material 
that is slowed down by drag because of interaction with the sur- 
rounding medium, and/or with the existence of intrinsically variable 
ejections?3-*5; both probably contribute to the position—velocity trend 
seen here. The ‘superjet’ HH34 (driven by the class I source HH34 IRS) 
also shows a velocity decrease’’, which is proposed to be caused by the 
drag-induced slowing of jet-entrained material. However, the shapes of 
the position—-velocity curves for HH34 and C7 differ, a difference that 
may be explained by the relative ages and precession of the sources. The 
initial C7 ejecta probably cleared some of the dense ambient material, 
reducing the drag forces for later ejecta following closely behind and 
in line with previous ejecta. HH34 is more evolved and is precessing to 
a greater extent, so ejecta seem to be more directly exposed to ambient 
material, which has not yet been disturbed by previous ejections. 

We also find that the velocities of southern (blueshifted) knots from 
C7 are consistently lower than the velocities of northern (redshifted) 
knots at comparable distances. This may be evidence for an inhomoge- 
neous ambient cloud medium, such that the southern knots are being 
slowed down by a denser environment. Alternatively, the jets may be 
intrinsically variable upon ejection from opposite sides of the disk. It 
is also possible that the outflow lobes have different inclination angles 
with respect to the plane of the sky, so that the line-of-sight veloci- 
ties to the north and south differ. C7 may precess slightly, given that 
blueshifted emission near C7 shifts to being predominantly red farther 
from the source. 

In Fig. 2c we show dynamic timescales for each of the identified 
ejecta, ranging from 100 years to 6,000 years (for knots within 24”, or 
10,000 av, of the source). The dynamic timescale for each ejection is 
given by Tdyn = D/Vaow (cos i/sin i), where D is the distance between 
an outflow knot and the driving source, Vaow is the velocity (along the 
line of sight) of the knot, and i is the inclination of the outflow with 
respect to the line of sight. Uncertainties arise because we do not know 
the inclination angle, and because we assume that the knots travel with 
constant velocity from the time of their launch. If a jet is launched 
from the disk?, then the longest timescale of an (unimpeded) outflow 
ejection should be a lower limit for the formation time of the disk. The 
longest timescale of a northern ejection is about 5,000 years; correcting 
for an inclination angle nearly in the plane of the sky, this could be 
smaller by a factor of about 10 (for i= ~85°) or more, which is con- 
sistent with the youthfulness of the source. Given that southern knots 
appear to have lower velocities than northern knots, the timescales of 
the southern (blueshifted) knots are longer on average than the north- 
ern (redshifted) knots. 

We quantify the episodic nature of the ejections, and corresponding 
accretion and/or disk instabilities*”°, on the basis of the difference in 
timescales, ATdyns for successive ejection events (Fig. 2d). Because of the 
contamination from the surrounding outflow emission to the south, 
we base the following calculations on the northern lobe only (within 
24” of C7). In Fig. 2e, we see that seven knots to the north show linearly 
increasing ATqyn as a function of Tayn, With ATayn ranging from 80 years 
to 540 years, and a mean ATgyp of 310 + 150 years. These seven knots 
are the most recently ejected to the north, with a Tayn of less than 2,400 
years (uncorrected for inclination angle). Several modes of velocity 
variability have been suggested for protostellar jets'*”, with periods 
of a few tens, a few hundreds, and a few thousands of years; in the case 
of a class 0 source, and assuming that C7 has an inclination approxi- 
mately in the plane of the sky, we are probably witnessing ejecta that 
are associated mostly with the shorter period modes. 
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We estimate that, within some 3,000 years, the farthest (slowest) 
ejecta in the north and south will have been overcome by each of 
the following (faster) ejecta, if ejecta travel with constant velocities 
(an admittedly simple assumption). These interactions will produce 
bright shocks along the outflow. ‘Snapshots’ of shocks in outflows can 
be seen in the emission of molecular hydrogen (H2)”8, which has a 
higher excitation temperature than does !#CO but cools quickly. 
Two H, bow-shaped shock structures, corresponding to faint, low- 
velocity '*CO emission lines 38” (0.08 pc) and 47” (0.09 pc) north and 
south of C7, respectively, are seen in the Spitzer 4.5-11m map of the 
region. These structures may be evidence of the first occurrence of a 
longer-period mode (of a few hundred years or more), where faster 
ejecta recently overcame slower ejecta. We propose that frequent ejec- 
tion bursts during the class 0 phase entrain molecular outflow material, 
which therefore appears clumpy, creating observable shocks when the 
ejecta overtake previous ejecta. 

Alternatively, if the position—velocity trend provides evidence for an 
interaction between ejecta and the environment, then the drag-induced 
momentum loss along the outflow signifies momentum transfer to 
the environment—an important mechanism that is proposed to drive 
turbulent motions in a clustered region*. We are carrying out further 
analysis of momentum injection along the span of the outflow at such 
an early stage, taking into account velocity-dependent opacity of the 
CO line and varying excitation temperatures throughout the outflow. 

The C!8O envelope seems to be oriented perpendicular to the 
outflow axis, with its major axis approximately east-west. Elongated 
blueshifted and redshifted C!8O emissions east and west of C7, respec- 
tively, are evidence of a non-spherical, rotating envelope. Blueshifted 
and redshifted peaks of high-velocity emission near C7 to the south 
and north, respectively, are consistent with infall motion onto a disk 
that is slightly inclined”®. 

Two features in the C'80 position—velocity diagram are represent- 
ative of some contribution from unresolved Keplerian rotation (on 
scales of less than ~400 av): larger velocities at smaller distances, and 
position-velocity intensity peaks offset bluewards and redwards from 
the line Visp= Ve. The position-velocity structure for C7 is consistent 
with a combination of rotation and infall on a slightly inclined disk, as 
shown in models”? and sketched in Extended Data Fig. 3. However, the 
C'80 position-velocity diagram (Fig. 3c) also shows some deviations 
from models of a rotating, infalling envelope: first, the blueshifted peak 
is stronger than the redshifted peak; second, redshifted emission with 
velocities Visp— Ve= ~0.5—1km s7! coincides with strong blueshifted 
emission at an offset of about —2”; and third, redshifted extended emis- 
sion west of C7 probably contaminates the C7 envelope emission. The 
outflow and envelope that we observe here clearly pertain to the same 
protostar, and higher-resolution observations of the disk and envelope 
will reveal the jet-launching region and disk-formation mechanisms 
in this young system. 

Online Content Methods, along with any additional Extended Data display items and 
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METHODS 
Observations and data analysis. The analysis is based on ALMA Cycle 1 observa- 


tions made with the 12-metre and 7-metre arrays during March 2014 and January 
to June 2014. The observed mosaics span 2’ x 3’ and consist of 137 and 53 point- 
ings, separated by 15” and 26”, made by the 12-metre and 7-metre arrays, respec- 
tively. Here, we focus on the roughly 90” x 20” region centred at RA= 18h 30min 
04.1s, dec. =—02° 03’ 02.6”. 

The ALMA correlator was configured in the frequency division mode 
(FDM) of band 6 with four independent spectral windows: one window was 
assigned to the J= 2—1 energy-level transition of each of the spectral lines 
2CO (230.538 GHz), CO (220.399 GHz) and C!8O (219.560 GHz), and the 
fourth was dedicated to continuum at 231.450 GHz. The bandwidth for each 
spectral-line window was 234.375 MHz, and the continuum window had a 
bandwidth of 468.750 MHz. To make a continuum-emission map, we included 
line-free channels in all spectral windows, resulting in a total continuum band- 
width of 996 MHz. The molecular line data for !7>CO and C!80, as well as the 
continuum, are included in the present analysis. 

We performed calibration of the raw visibility data with the common astronomy 
software application (CASA, version 4.3.0), using the standard reduction script 
for Cycle 1. We assigned weights to the measurement sets using the task ‘statwt’ 
and combined the calibrated 12-metre and 7-metre array UV data using the task 
‘concat. 

We created image cubes for each molecular line, as well as the continuum image, 
by first applying a Fourier transform to the calibrated data, producing an inter- 
mediate (‘dirty’) image. Using the intermediate image, we drew masks around the 
emission features, and these masks were used in an interactive ‘clean’ process to 
deconvolve the telescope point-spread function. We used Briggs weighting with 
a robust parameter of 0.5, and we imaged with a cell size of 0.3” and a spectral 
(velocity) resolution of 0.16 km s~1. Finally, we subtracted continuum emission 
from the spectral-line data using the task ‘imcontsub. 

The resulting beam sizes for the *CO and C180 data cubes are 0.9” x 0.6” 
(with position angles of 79.7° and 76.3° for '*CO and C'4O, respectively). The 


root-mean-squared (r.m.s.) noise levels are 9 mJy beam ! channel“! and 
8 mJy beam! channel“! respectively, with channel widths of 0.16 km s~!. The 
r.m.s. noise level for the continuum is 0.2 mJy beam! near the edge of the region 
presented here, with an upper-limit r.m.s. noise level of 0.3 mJy beam~! within 
30” of the strong continuum emission. 

c!80 channel maps. The C80 emission, shown in Extended Data Fig. 1, is 
concentrated where the northern redshifted and southern blueshifted 
CO emissions meet. The C'8O morphology changes from extended at 
the lowest velocities (|Visg — V-|=0—0.7km s~1) to compact and oriented 
approximately north-south, or coincident with the outflow axis, at higher 
velocities (|Visr— Vc] =1.3—1.7km s~'). At intermediate velocities 
(|Visr— Ve] =0.6 —1km s~'), elongated blueshifted and redshifted emissions are 
seen east and west of C7, respectively. A shell in the C!80 emission in the south, 
and less noticeably in the north, is seen bisected by the '7CO axis in Fig. 3a, b. 
This is similar to the situation with the protostar HH212 (ref. 30), and in both 
cases material originally in the envelope is probably swept up to form the cavity. 
It may be too early for the outflow to have a noticeable impact on the infall and 
rotation motions of the envelope. 

Continuum emission. The continuum emission peaks in our map at RA= 18h 
30 min 04.1s, dec. = —02° 03’ 02.6” (see Extended Data Fig. 2), with an inten- 
sity of 93.9 mJy beam, and this coincides with the centre of the C!80 emis- 
sion (Fig. 3). Although the highest-intensity continuum emission (greater than 
~500) is concentrated and can be fit well with a two-dimensional Gaussian 
curve, the weaker (yet statistically significant) continuum emission is elon- 
gated northwest-southeast. Additional continuum emission from the nearby 
protostar CARMA-6 may contribute to the extended continuum emission, and 
molecular outflow emission is also associated with this source (although not 
shown here). 


30. Lee, C.-F. et al. ALMA results of the pseudodisk, rotating disk, and jet in the 
continuum and HCO* in the protostellar system HH 212. Astrophys. J. 786, 
114 (2014). 
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Extended Data Figure 1 | C80 emission from the protostellar source C7. Top row, blueshifted emission; bottom row, redshifted emission; velocity 
increases from left to right. Contours begin at 40 and increment by 4o. Specific velocity ranges (|Visr — V-|, or velocity relative to cloud velocity) are given 
for each column. Each panel shows integrated emission from two channels. The location of peak continuum emission is marked with a magenta cross. 
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Extended Data Figure 2 | 1-mm continuum emission near the sources CARMA-7 (RA = 18h 30 min 04.1 s, dec. = —02° 03’ 02.6”) and CARMA-6 
(RA = 18h 30 min 03.5, dec. = —02° 03’ 08.4”). Contours show 10g, 300, 500 and 70g, followed by increments of 500. Near these strong sources, we find 


the r.m.s. noise to be 0.3 mJy beam7!. 
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Extended Data Figure 3 | Cartoon depiction of a protostellar system, showing the outflow (!*CO emission), envelope (C'%O emission) and disk 
(unresolved). Contributions to blueshifted and redshifted molecular line emission are indicated along the outflow and envelope, assuming that the outflow 
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Hong-Ou- Mandel interference of two phonons in 


trapped ions 


Kenji Toyoda!, Ryoto Hiji!, Atsushi Noguchi!+ & Shinji Urabe! 


The quantum statistics of bosons and fermions manifest themselves 
in the manner in which two indistinguishable particles interfere 
quantum mechanically. When two photons, which are bosonic 
particles, enter a beam-splitter with one photon in each input 
port, they bunch together at either of the two output ports. The 
corresponding disappearance of the coincidence count is the Hong- 
Ou-Mandel effect'. Here we show the phonon counterpart of this effect 
in a system of trapped-ion phonons, which are collective excitations 
derived by quantizing vibrational motions that obey Bose-Einstein 
statistics. We realize a beam-splitter transformation of the phonons 
by employing the mutual Coulomb repulsion between ions, and 
perform a two-phonon quantum interference experiment using 
that transformation. We observe an almost perfect disappearance 
of the phonon coincidence between two ion sites, confirming that 
phonons can be considered indistinguishable bosonic particles. The 
two-particle interference demonstrated here is purely a quantum 
effect, without a classical counterpart, hence it should be possible 
to demonstrate the existence of entanglement on this basis. We 
attempt to generate an entangled state of phonons at the centre 
of the Hong-Ou-Mandel dip in the coincidence temporal profile, 
under the assumption that the entangled phonon state is successfully 
generated if the fidelity of the analysis pulses is taken into account 
adequately. Two-phonon interference, as demonstrated here, proves 
the bosonic nature of phonons in a trapped-ion system. It opens 
the way to establishing phonon modes as carriers of quantum 
information in their own right?~4, and could have implications 
for the quantum simulation of bosonic particles™® and analogue 
quantum computation via boson sampling’. 

When two photons with the same wave-packet temporal profiles 
and polarization are made to interfere at a 50:50 beam-splitter, the 
coincidence between the photon detection at the two output ports 
disappears. This Hong—Ou-Mandel (HOM) effect?®-” has also 
been observed for atoms!3:!4, and a related effect has been noted 
in the case of fermions’*!®. The HOM effect and the underlying 
mechanisms for the generation and interference of indistinguishable 
particles not only reveal the fundamental natures of these particles, 
but also enable large-scale quantum information processing (QIP)!”. 
The HOM effect has also been used to generate entanglement 
between two atomic ions separated by one metre’®. 

In research into QIP using trapped ions, phonons have usually played 
a supporting role in mediating interactions between internal-state 
qubits or pseudo-spins!®°. They are also expected to play a central 
role in simulating bosonic-particle systems*”. As a crucial step towards 
phonon-based applications, the coupling of multiple vibrational modes 
of ions at a single quantum level has been realized. Phonon indistin- 
guishability is another key component necessary for these applications, 
but this has not been explicitly demonstrated previously. 

A system of trapped ions presents an ideal environment for QIP and 
quantum simulation. In this study, the almost perfect matching of radial 
frequencies among different sites in a linear trap and the near perfect 


preparation of initial states by sideband cooling assure the exact indis- 
tinguishability of the phonons in this system. 

For two ions in a linear Paul trap, the hopping Hamiltonian for the 
local phonon operators is expressed as (see Methods for details) 
i= ~* (aya + 4a.) (1) 
Here, « is the hopping rate of the radial phonons between two sites, fi 
is h/2n where h is the Planck constant, anda; anda 1 are the annihilation 
and creation operators of the radial phonons at the ith site, respectively. 
From this Hamiltonian, the propagator can be calculated and the local 
phonon operators for sites 1 and 2 in the Heisenberg picture are 
expressed as (for a detailed derivation, see Methods) 


@,(t) =4 cos ia,sin 
; er i (2) 


a,(t) = ~ia sin& + 4,cos~- (3) 


When time f= Thop/4, where Top =21/k is the hopping period, this 
transformation corresponds to a 50:50 beam-splitter in linear optics. 

With the transformation given above, the initial product state|1),|1), 
is transformed to —i(|2),|0), + |0),|2).) /-V2, where|n), represents the 
phonon Fock state of the ith ion with the quantum number n. This 
(ideal) state is an entangled state (a NOON state with N= 2), which is 
a superposition of states where phonons are bunched at either of the 
two sites. Thus, the coincidence of phonons between the two sites dis- 
appears. See Fig. 1 for a conceptual diagram of the phonon dynamics. 

To observe the two-phonon interference, we use the radial modes 
of two “°Ca’* ions in a linear Paul trap with secular frequencies (w,, 
Wy, Wz)/2m = (3.45, 3.20, 0.11) MHz, where x and y are the two radial 
directions and z is the axial direction. The phonon modes in one radial 
direction, y, are used here as the modes that will be manipulated and 
observed in the experiment. The distance between the two ions is 24um 
and the hopping rate «/2m +2 kHz. Doppler cooling is performed by 
illuminating the system with a 397-nm laser resonant to the S1/2++ Pi; 
electronic state transition and an 866-nm laser resonant to D3)? P}/2. 
The states Sj/2 (my=—1/2) and Ds;2 (my=—5/2) are used as the inter- 
nal ground and excited states for the present experiment, where my is 
the projection of the total electronic angular momentum. Sideband 
cooling of the motional states and manipulation of the carrier and 
sideband transitions in the y direction are performed by illuminating 
the system with a 729-nm laser resonant to the S)/.++ Ds,2 transition. 
An 854-nm laser resonant to Ds);2+ P3/2 is also used as a quenching 
laser in the sideband cooling and to clear out the population in the 
excited state. Observation of the internal states is performed using 
a photomultiplier, by illuminating the ions with the 397-nm and 
866-nm lasers. Further details of the experimental procedures are given 
in Methods and in our previous publications**. 
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Figure 1 | Conceptual diagrams of phonon hopping dynamics and two- 
phonon interference. a, Hopping dynamics. An initial Fock state|1),|0), with 
one phonon is prepared at t=0, and the phonon hops back and forth between 
the two ion sites with period T),.)=2n/k. b, Two-phonon interference. An 
initial Fock state,|1),|1),, with two phonons is prepared. The two phonons are 
made to hop between two sites, and at f= Thop/4 and 3 Thop/4, the HOM effect 
is observed; thus, the phonon population coincidence between the two sites 
disappears and an entangled state, (|2),|0), + |0),|2),) / V2, is generated. 


_ We first examined the hopping dynamics” * due to the Hamiltonian 
A, for an initial Fock state|g, 1),|g, 0)>. Here, |g(e), ); represents the 
basis for the internal ground (excited) state with n local phonons in the 
ion site i. The experimental procedure is as follows: (1) all the radial 
vibrational modes of the two ions are cooled to near the ground state 
via sideband cooling; thus, the initial state (|g, 0),|g, 0),) is prepared; 
(2) ion 1 is irradiated with a 1 pulse (duration ~19 us), which is reso- 
nant with the blue-sideband transition; hence, the state is transferred 
to |e, 1),|g, 0); (3) immediately after this, the ions are irradiated with 
an 854-nm (quenching) pulse with 30-s duration to re-initialize the 
internal state of ion 1 to the ground state. The expected state after this 
operation is|g, 1),|g, 0), which is used as the initial state for the hop- 
ping experiment; (4) a pause with no laser irradiation is permitted to 
allow the phonon system to undergo hopping; (5) a red-sideband n 
pulse is applied to the ions to map the manifold {|g, 0);, |g, 1);} to 
{|g,0);, |e, 0);}. The 397-nm laser illumination is then used to record 
the fluorescence and, hence, the phonon state is estimated. Steps (1)-(5) 
are repeated with different pause durations (4) to deduce Thop. 

Figure 2a shows the result of phonon hopping for two ions with a 
pause duration of up to 11 ms (circles). The horizontal and vertical 
axes represent the pause duration and the mean phonon number of ion 
1, respectively. The fit to a sinusoidal function with an exponentially 
decaying envelope (solid curve) gives « =21 x 2.05 kHz (Thop =489 us) 
and an e~! decay time of 13.0 ms. 

Figure 2b shows the same result (circles) with a magnified horizontal 
axis scale. According to the fit (solid curve), at t= 57 us (marked with 
a red vertical line) the effect of the hopping Hamiltonian corresponds 
to the transformation produced by a 50:50 beam-splitter. The dashed 
curve is a simulated result (see Methods) and basically reproduces the 
experimental finding both qualitatively and quantitatively. In the result 
shown in Fig. 2b, the maximum value of the phonon population does 
not reach 1. In addition, the results do not begin from this value at 
the origin of the horizontal axis. These imperfections are explained 
in Methods. 
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Figure 2 | Experimental results for phonon dynamics. a, Observed phonon 
hopping dynamics. The horizontal and vertical axes represent the pause 
duration and phonon population in ion 1, respectively. The circles indicate 
experimental values, and the combination of solid and dashed curves is 

a fit with a sinusoidal function with an exponentially decaying envelope. 

The vertical line at t= 57 us represents the point for the 50:50 beam-splitter 
transformation, which is estimated from the fitting. b, The same result as 
previously (a) with a magnified horizontal scale, for comparison with the next 
result (c), which shares the same horizontal axis. The circles are experimental 
values, the solid curve is a sinusoidal fit, and the dashed curve is a simulation 
result. The vertical line at t=57 us represents the point for the 50:50 beam- 
splitter transformation, which is estimated from the fitting. c, Observed 
coincidence. The coincidence of the internal states after the red-sideband 1 
pulse for mapping is interpreted as the phonon coincidence. The horizontal 
and vertical axes represent the pause duration and coincidence, respectively. 
The circles are experimental values, the solid curve is a sinusoidal fit, and the 
dashed curve is a simulation result. The vertical line at t= 56 us represents 
the point for the 50:50 beam-splitter transformation, which is estimated 
from the fitting. The error bars denote the standard deviation calculated from 
the variances and covariances of multinomial distributions. The number of 
measurements per data point is 50. 


Next, we observed the coincidence of two phonons and attempted 
to generate an entangled state, having an ideal form of 
(|2),|0), + |0),|2),) /-/2. The majority of the experimental procedure 
is identical to that of the hopping experiment. The differences are that, 
in step (2), both ions 1 and 2 are irradiated with a blue-sideband 1 
pulse; thus, the initial state |g, 1),|g, 1), is prepared after step (3). 
Further, in step (5), among the three possibilities for a phonon Fock 
state having two phonons ({|g, 0); lg,2)3s lg; 1):lgs1)a» lg, 2)ilg.0)o}). 
only |g, 1); |g, 1), is transferred by a red-sideband nm pulse to a state 
having two internal excitations (that is, |e, 0),|e, 0),). Then, the 397-nm 
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Figure 3 | Measurement of fidelity of the (|2),|0), + |0),|2),) / V2 state. 
a, Observed ion fluorescence. The horizontal and vertical axes represent the 
fluorescence photon counts and the number of occurrences of each photon 
count. The vertical lines represent the threshold levels used for discrimination 
of the photon counts. The solid curve is a fit with the sum of four Gaussians. 
b, Measurement of parity against the phase of the analysis 1/2 pulse. The 
circles indicate experimental values and the solid curve is a sinusoidal fit. The 
crosses are simulation results, and the dashed curve is a sinusoidal fit to the 
simulated data. The simulation result reproduces the experimental result well. 
The error bars denote the standard deviation calculated from the variances 
and covariances of multinomial distributions. The number of measurements 
per data point is 50. 


laser illuminates the system in the same way as above, and the coinci- 
dence of the internal excitations, that is, the probability that both of the 
ions are shelved to Ds), is estimated from the fluorescence. This coin- 
cidence of internal excitations is finally interpreted as the phonon 
coincidence between the two sites before the application of the red- 
sideband r pulse. 

Figure 2c shows the results of the phonon coincidence between 
the two sites (circles). According to the fit with a sinusoidal function 
(solid curve), at t56 us (marked with a red vertical line) the effect 
of the hopping Hamiltonian amounts to the transformation pro- 
duced by a 50:50 beam-splitter. This point in time, t +56 us, is very 
close to the corresponding time point in Fig. 2b (t+ 57 us), hence, 
it is safe to say that the two results are consistent in this regard. We 
can see a dip at this point, which indicates that both of the ion sites 
are not simultaneously occupied by phonons. The almost perfect 
disappearance of the coincidence guarantees almost perfect inter- 
ference between the two phonons. The dashed curve is obtained 
through simulation (see Methods) and also reproduces the experi- 
mental result well. 

The phonon state at the dip is expected to be an entangled 
state (|2),|0), +|0),|2),) /-/2, although we cannot ignore the popula- 
tions in states such as|0),|0),,|0),|1), and|1),|0),, which do not con- 
tribute to the coincidence. These states may be mixed because of 
imperfect preparation of the initial state, |1),|1), , by blue-sideband 
pulses (see Methods for details of this imperfection). In order to con- 
firm the generation of the entangled state, we estimated its fidelity using 
optical pulses to transform the state to (|T),|T)5 +|1);|1)>) /-/2; where 
{L);, 1) }=flg, 0), |e, 1),}. The density-matrix components after this 


76 | NATURE | VOL 527 | 5 NOVEMBER 2015 


transformation were obtained, which were used to calculate the fidelity 
(see Methods for details). 

Figure 3a shows the measured populations in the internal states for 
estimation of the diagonal components of the density matrix, p;+;; and 
Pl). Lf the ions are in |) ,|1),, no fluorescence should be recorded, and 
if the ions are in ||),||),, fluorescence from both of the ions should be 
recorded. Therefore, the sum of the peaks around 5 and 150 in 
the horizontal axis corresponds to the sum of the diagonal 
components of the density matrix. The result of the measurement is 
Pritt + Pls) =0.69 + 0.02. 

Figure 3b shows a sinusoidal oscillation of the parity (circles) and a fit 
with a sinusoidal function (solid curve). The amplitude of the sinusoidal 
oscillation in the parity corresponds to the sum of the off-diagonal 
components of the density matrix, p;;|; and p|\+;. The value for 
the experimental parity result is p;+|; +p));;=0.35 0.05. Therefore, 


the fidelity for the (\f),|T), +{L),|1),)/V2_ state is 0.52 £0.03 (see 
Methods; the confidence intervals are for a 68% confidence level). Thus, 
we cannot state that this value clears the 0.5 threshold”! when the confi- 
dence interval is considered. However, if we consider the reduction of 
the fidelity due to the imperfections in the analysis pulses and make a 
corresponding correction, the fidelity is estimated to be 0.74 + 0.05 (see 
equation (19) in Methods). Thus, we speculate that an entangled phonon 
state is successfully generated in the experiment. 

We have demonstrated two-phonon interference in a trapped-ion 
system. This is an essential step towards the realization of boson 
sampling””?*° with trapped ions. Recent advancements in the field of 
cavity optomechanics” have enabled control of the vibrational motion 
of micro- and nanoscopic mechanical systems at the level of a single 
vibrational quantum. In those systems, coupling of multiple phonon 
modes may become possible”®, thereby enabling multi-mode phonon 
dynamics, as demonstrated here. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Hopping Hamiltonian and beam-splitter transformation. We assume that two 
ions are confined in a linear Paul trap. When the harmonic confinement in the 
radial directions is significantly stronger than the Coulomb forces (the condition 
for ‘stiff modes’*), the radial phonons can be regarded as local phonons confined 
in each ion site. In this situation, the radial phonons behave as bosonic particles, 
their total number is conserved, and the Coulomb interaction induces hopping of 
the radial phonons between the two ion sites. 

The Hamiltonian that governs the motion in one of the radial directions (which 
we refer to as the y direction here) can be expressed as 


a“ K \ ata a AP | ata 
Ay=5 nfo Jala,+ (4,4) + a; a,) 
y y peas m2 L2: 
i=1,2 2 4 (4) 


where «, is the oscillation frequency in the y direction and 


2 
e 


K = —————_ 
4nedsMw, (5) 


is the phonon hopping rate? (¢o is the vacuum permittivity, dy = |z, — z,| is the 
inter-ion distance in the axial direction and M is the ion mass). a; and @; are, 
respectively, the annihilation and creation operators of local phonons in the ith 
site. The first term on the right-hand side of equation (4) describes the harmonic 
oscillators associated with the ion sites, while the second term represents phonon 
hopping between the two sites. By moving to an interaction picture, where the 
trivial dynamics due to the first term are omitted, we obtain the interaction 
Hamiltonian 


A= reg ay) (6) 


The propagator for this Hamiltonian is 


U,(t) on a) exp 


and the annihilation operator in the Heisenberg picture is 


where 
a,(0) =4;, (i,j =1,2) (9) 
For two ions, H; couples|n),|n + 1)and|n + 1), |n), causing phonon energies 
to be exchanged between the ions at x. Here,|n); represents the phonon Fock state 
of the ith ion with the quantum number n. The 50:50 beam-splitter transformation, 
which is routinely used in linear optics, can be described as 


6, = (4, — ia.) / 2 (10) 


b, = (—ia +4) / 2 (11) 
where b, (i= 1,2) is the annihilation operator for each of the two beam-splitter 
outputs. U;,(t) with t=1/ 2x corresponds to this transformation and 4;(1/2x) 
and 4(/2x) are equal to b, and b, in equations (10) and (11), respectively. 
m/2k is equal to Thop/4, where Thop = 21t/« is the hopping period. With the trans- 
formation given above, an initial product state |1),|1), is transformed to 
—i(|2),|0)y + [0),12)2) / V2: 

In addition to the local-mode picture described immediately above, the 
two-phonon dynamics and the HOM effect can be understood in terms of the 
collective-mode picture. The hopping Hamiltonian, equation (6), can be diag- 
onalized in the two-phonon subspace using the following three eigenkets in the 
collective-mode picture 


1, 1 1 1 
|2.0,) = ~=42"|0),|0)> = 7 rlOe + elle + 510) 12)2 


J2 
|1,1,) = atat|0), 0), = zl) - eh) 


Ww 1 1 1 
|0.2,) = -!7)0), 0), == 12)]0)2 ~ INI)» + 510) (2) 


with eigenvalues +/ix, 0 and —fk, respectively. Here 


(15) 


(16) 
are the creation operators for the radial centre-of-mass (c.m.) and rocking modes, 
respectively. Using these eigenkets, an initial Fock state|1),|1), can be expressed as 
al 
2 


Under the dynamics due to the hopping Hamiltonian, this state alternates with the 


: + . 2n i 
following state (apart from a phase factor) with period Se = 7 thop 


|) |1)2 = (|2.0,) 7 |0,2,)) 


(17) 


1 


1 | 
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(18) 


These dynamics cause an oscillation of the phonon coincidence between the two 
sites, which can be observed in experiments. 

Experimental procedure. The 729-nm laser propagates in a direction that creates 
angles of 45°, 45° and 90° with the x, y and z directions, respectively. Sideband 
cooling is performed for both the x and y directions, where x is the other of the 
two radial directions that is orthogonal to y. The axial direction, z, is cooled 
only by Doppler cooling. There are two collective modes in both the x and y direc- 
tions, namely, the c.m. (in-phase) mode and the rocking (out-of-phase) mode. 
Their frequency separation is of the order of x/2m 2 kHz, and here, for both x 
and y, the c.m. and rocking modes are cooled at the same time. The average 
quantum numbers in the y direction after the sideband cooling are 
(fy ems ly rock) = (0.04 £0.07, 0.03 + 0.12). For the x direction, 1, . and fi, rock 
are estimated to be < 0.7 and < 0.2, respectively. 

Individual observation of the two ions is enabled by illuminating them with 

unequal intensities; this is achieved by displacing the centre of the waist of the 
397-nm beam from the midpoint of the inter-ion distance, so that their fluores- 
cence levels differ. 
Numerical simulation and fidelity of sideband Rabi pulses. We performed a 
numerical simulation of the whole dynamics of the two-ion system irradiated with 
the lasers, in order to confirm the experimental results and to support the fidelity 
analysis. We used a Liouville equation with Lindblad-type relaxation terms. The 
parameters used for the sideband Rabi rotations in the simulation were a frequency 
of 26.3 kHz and an e“! decay time of 100 us. We assumed the latter to be due to 
a pure phase relaxation. These two quantities were estimated from experimental 
results for sideband Rabi oscillations. Finally, x was assumed to be 21 x 2.05 kHz. 
The reason for the relatively fast decay time of the sideband Rabi oscillation 
(100 us) is currently unknown. The possible causes are: phase jitter of the excitation 
pulse due to beam jitter, fluctuations of AC Stark shifts due to the 729-nm laser, and 
relaxation of the motional coherence due to other motional modes. 

We estimated the expected fidelity based on this simulation. The sum of the 
diagonal components of the density matrix, the sum of its off-diagonal compo- 
nents, and the fidelity were found to be, respectively, (0.742 + 0.000, 0.407 + 0.014, 
0.575 + 0.007). As stated above, this was obtained while assuming relatively fast 
relaxation in the sideband Rabi oscillations. The resultant imperfections and 
non-fidelities are relatively large. If we assume perfect fidelity for the analysis pulses 
(the red-sideband 1 pulse on |g, 2); — |e, 1);, the blue-sideband r/2 pulse on 
|g, 0); — |e, 1); and, additionally, the blue-sideband 1/2 pulse with varying phase 
on|g, 0); — |e, 1); in the case of parity analysis), the quantities quoted above are 
found to be (0.885 + 0.000, 0.753 + 0.032, 0.819 + 0.016), respectively. 

From these two cases, we estimated the ratios of the reduction of the diagonal 
and off-diagonal components due to the imperfections in the sideband Rabi rota- 
tions, which were found to be 0.838 + 0 and 0.541 + 0.030, respectively. If these 
ratios are used to divide the experimental values (0.691 + 0.020 and 0.352 + 0.047, 
respectively), the diagonal and off-diagonal components expected in the case of 
ideal analysis pulses without imperfection are 0.825 + 0.024 and 0.651 + 0.094, 
respectively. The fidelity in this case is then estimated to be 0.738 + 0.048. This 
value well exceeds the threshold of 0.5 for the entangled states””. 

Imperfections in hopping and coincidence results. In the result shown in Fig. 2b, 
the maximum value of the phonon population does not reach 1. In addition, 
the results do not begin from this value at the origin of the horizontal axis. This 
behaviour would not be observed in an ideal case involving instant preparation 
and analysis of the phonon states with perfect fidelity. Instead, the phonon popu- 
lation in ion 1 would begin at the maximum value, which should be 1. The former 
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imperfection is due to the reduction of the fidelity in the sideband Rabi rotations 
(see the previous section), which causes imperfect preparation of the phonon 
Fock states. The latter imperfection is due to the non-negligible lengths of the 
pulses used for preparation and analysis compared with the hopping period. The 
blue-sideband m and quenching pulses used for the preparation have durations 
of ~9 us and ~30 Us, respectively, and the red-sideband 1 pulse used for analysis 
(mapping) has a duration of ~ 19 us. In Fig. 2b, it can be seen that the time origin 
of the dynamics is shifted in the negative direction by ~ 67 us, which is not very 
different from the sum of the above three values. 

The imperfections in the maximum value and the shift of the time origin 
in Fig. 2c can be explained in a similar manner to the case shown in Fig. 2b. 
Fidelity of (|2),|0), + |0),|2),)//2 entangled state. We estimated the fidelity 
of the(|2),|0), + |0),|2),) /./2 entangled state at t- 56 us in the following manner. 
(1) A red-sideband nm pulse that was resonant with the |g, 2); < |e, 1); transition 
was applied to (|2),|0), +|0),|2),)/./2 . Thus, this state was transferred 


to (le, 1), |g, 0). + |g, 0), |e, 1).) /J2 . By rewriting |e, 1); as|f); and |g, 0); 
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as||);, this state could be expressed as (|T):1!)2 + ILdlT)2)/V2 (2) A blue- 
sideband 1/2 pulse that was resonant with the |{); ||), transition was 
applied to (|T),;|1). +|L):|1)2)/V2. Thus, this state was transferred to 
(IT )alT)2 +1 ):11)2) //2.. (3) In order to estimate the diagonal components of 
the density matrix, p ||| and p77, the fluorescence of the two ions was recorded 
by illuminating the system with the 397-nm laser at this point. (4) To measure the 
non-diagonal components of the density matrix, p| |;; and p;;,|, we performed a 
parity measurement. A 729-nm blue-sideband 1/2 pulse with varying phase was 
applied to (IT) It)o +1L)11)2)/V2- and the fluorescence of the two ions was 
recorded. 

We obtained the fidelity of the state (|2),|0), + |0),|2),) /./2 using the relation 


= a = vi 
FE [Wall => Pit Ponte t Pure t Pry) (19) 


If this value exceeds 0.5, we regard the generated state as an entangled state’. 
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For renewable energy sources such as solar, wind, and hydroelectric 
to be effectively used in the grid of the future, flexible and scalable 
energy-storage solutions are necessary to mitigate output fluctua- 
tions’. Redox-flow batteries (RFBs) were first built in the 1940s? 
and are considered a promising large-scale energy-storage techno- 
logy’**. A limited number of redox-active materials**'°—mainly 
metal salts, corrosive halogens, and low-molar-mass organic com- 
pounds—have been investigated as active materials, and only a few 
membrane materials**"’"“*, such as Nafion, have been considered 
for RFBs. However, for systems that are intended for both domestic 
and large-scale use, safety and cost must be taken into account as 
well as energy density and capacity, particularly regarding long- 
term access to metal resources, which places limits on the lithium- 
ion-based and vanadium-based RFB development’*’*’. Here we 
describe an affordable, safe, and scalable battery system, which uses 
organic polymers as the charge-storage material in combination 
with inexpensive dialysis membranes, which separate the anode 
and the cathode by the retention of the non-metallic, active 
(macro-molecular) species, and an aqueous sodium chloride solu- 
tion as the electrolyte. This water- and polymer-based RFB has an 
energy density of 10 watt hours per litre, current densities of up to 
100 milliamperes per square centimetre, and stable long-term 
cycling capability. The polymer-based RFB we present uses an 
environmentally benign sodium chloride solution and cheap, com- 
mercially available filter membranes instead of highly corrosive 
acid electrolytes and expensive membrane materials. 

A growing interest in organic redox-active materials has been 
observed over the last few years (Extended Data Table 1). Semi-organic 
systems include TEMPO  (2,2,6,6-tetramethylpiperidinyloxyl)/ 
lithium’, anthraquinone/lithium”, and viologen/lithium’’. Despite 
high cell voltages and good theoretical capacities, the power capability 
of these systems is limited, owing to low ion mobility in the organic 
solvents used. This drawback can be overcome by using acid-based, 
aqueous electrolytes as suggested for a bromine-anthraquinone cell®. 
However, bromine and highly acidic electrolytes represent a substan- 
tial risk and a challenge to all applied system components (for example, 
corrosion of pumps, pipes, and storage tanks), and so interest in all- 
organic RFBs arose’*"’. Yet these systems have low capacities (at most 
5Whl}), poor current densities (at most 10 mA cm “), and poor 
cycling stabilities (below 30 cycles). 

Virtually all realized RFBs are made of two electrolyte circuits sepa- 
rated by an ion-selective membrane. Considerable effort has been 
invested to develop membrane materials that show a low area resistiv- 
ity, are chemically resistant to acidic electrolytes, and are highly select- 
ive to prevent cross-contamination of the electrolytes. Nevertheless, 
“the membrane has been identified as one of the main obstacles in the 
commercialisation of many redox flow cells”!’. 

Perfluorinated ion-exchange membranes are commonly used, 
because they are robust and withstand a highly oxidative and corrosive 
environment. However, Nafion—the most commonly used material— 


accounts for almost 40% of the cost of the reaction cell. Alternative 
membrane materials include microporous membranes such as (filled 
and/or modified) Daramic or Celgard**'’"*; the latter has also 
been tested with organic polymer solutions. Celgard (pore radius of 
14-21 nm) targets a pore-size exclusion effect (steric hindrance), 
but only works with polymers of very high molar mass’*”°. Several 
attempts to use nanofiltration membranes (pore radii in the 1-nm 
range) in vanadium-based RFBs showed that it is possible to achieve 
vanadium/proton selectivity by means of pore-size exclusion. 
However, the corrosive electrolyte is highly demanding in terms of 
chemical membrane stability, and large-scale applications require 
inexpensive, easy-to-manufacture materials*’’. 

The aforementioned challenges illustrate the need for a battery sys- 
tem that combines a water-based electrolyte with an organic redox- 
active material and a suitable low-cost membrane. Here we propose a 
new REB design that fulfils these demands by using (1) organic poly- 
mers as the redox-active species, (2) an aqueous sodium chloride 
solution as the electrolyte, and (3) simple dialysis membranes (Fig. 1). 

Dialysis membranes, which can retain macromolecules of high 
molar mass while allowing small ions to pass regardless of their charge, 
are affordable and widely used—from laboratory-scale experiments to 
industrial water-treatment facilities**, For the proposed polymer- 
based RFB, we chose a cellulose-based dialysis membrane with a 
molecular-weight cut-off (MWCO; indicating the lowest retained 
molar mass) of 6,000 g mol | and an aqueous sodium chloride solu- 
tion as the supporting electrolyte. Both components were selected 
because of their compatibility with the chosen redox-active polymers. 
A large number of redox-active polymers have been studied for use in 
solid-state batteries in the past*°’®, but only compounds that show 
stable redox behaviour in water are suitable for the proposed RFB 
system. Therefore, extensive screening of potential polymers was per- 
formed. 

The optimized polymers consist of two components: a redox-active 
moiety and a unit enhancing water solubility to prevent precipitation 
in all used redox states. The cathode material contains the TEMPO 
radical as the redox-active moiety, while the anode material uses a 4,4’ - 
bipyridine derivative (viologen). The water-solubility of both polymers 
is enhanced by a quaternary ammonium cation moiety. The cathode 
material was prepared by free radical copolymerization of 2,2,6,6-tet- 
ramethylpiperidin-4-yl-methacrylate 1 and amine 2 (for synthetic 
details see Extended Data Fig. 1). Subsequent oxidation with H,O,/ 
Na,W0O, yielded the desired polymer P1 (Fig. 1a). The anode material 
P2 (Fig. 1a) is obtained by copolymerization of 4-vinylbenzyl chloride 
3 and the amine 4 (Extended Data Fig. 1), followed by polymer-ana- 
logous functionalization with N-methyl-bipyridinium iodide and an 
ion exchange to chloride. Both materials were prepared at kilogram 
scale. A modifier (2-mercaptoethanol) was used in the preparation of 
P1 to guarantee a low molar mass (M,,) of about 20,000 gmol' anda 
low dispersity; this was not necessary in the second polymerization 
process. The molar-mass target was chosen to achieve both a good 
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Figure 1 | Working principle of a polymer-based RFB. a, Schematic 
representation of a polymer-based RFB consisting of an electrochemical cell 
(which determines the power density) and two electrolyte reservoirs (which 
determine the storage capacity). The anolyte and catholyte cycle are 
separated by a semipermeable size-exclusion membrane, which retains the 
redox-active macromolecules while allowing small salt ions to pass. During the 


retention of the polymer by the dialysis membrane and a dynamic 
viscosity as low as possible. 

Rheological investigations of the catholyte and anolyte, each with a 
capacity of about 10Ah1~', revealed Newtonian behaviour and 
apparent viscosities of 17 mPas (P1) and 5mPas (P2) in the shear- 
rate range that is typically attributed to pipe flow (Extended Data 
Fig. 2). Consequently, in contrast to RFBs that use a polymer of high 
molar mass and microporous Celgard membranes", the energy 
required to pump the electrolyte is kept to a minimum, which facil- 
itates efficient transport of the solutions through the reaction cell. 

The performance of an RFB is strongly influenced by the quality of 
the membrane. It has to retain the redox-active species, thus prevent- 
ing internal short-circuits and_ self-discharge processes, while 
facilitating the transport of ions, which are necessary to sustain elec- 
tro-neutrality. The salt permeability (Ps) for sodium chloride through 
the studied cellulose-based dialysis membrane—the thickness-normal- 
ized diffusion coefficient—was found to be (9.34 0.1) X 10°°cms * 
(the uncertainty here and elsewhere corresponds to error propagation 
of linear-regression-analysis error; Extended Data Fig. 4). This leads to 
a low area resistance (R) of the membrane in aqueous sodium chloride 
solution of 1.14 + 0.03 Qcm?, which is in the range of Nafion and 
enables good cell performance’’. The redox-active polymers, which 
have a hydrodynamic radius of around 2nm (determined using 
dynamic light scattering), are effectively retained by the dialysis mem- 
brane, which has an estimated pore size <1 nm. Minimum membrane 
selectivities (Sin) for sodium chloride over the redox-active polymers 
of 290 (P1) and 2,830 (P2) were obtained, which are much higher than 
the membrane selectivities of Nafion"’, and those of nanofiltration”"”* 
and microporous membranes’®. In addition, the retention was tested 
under real-life conditions in an RFB test cell. Cyclic-voltammetry 
measurements of samples taken from the anolyte and the catholyte 
solutions after 10,000 charging/discharging cycles show no detectable 
amounts of polymer P1 transferred from one cell compartment to the 


Size-exclusion 
membrane 


Redox equilibrium in P2 


charging/discharging process, a solution of the redox-active polymers P1 and 
P2 is continuously transported from the electrolyte reservoirs to the 
electrochemical cell, where the redox reactions take place. b, Fundamental 
electrode reactions of P1 (TEMPO radical) and P2 (viologen). Structural details 
of compounds shown in this figure are available as Supplementary Information. 


other (with a detection limit of approximately 2 ug ml” ') and only 
traces of P2 (Extended Data Fig. 5). This finding is supported by the 
consistently high coulombic efficiency of 99%. 

The redox properties of the TEMPO/viologen redox pair were 
studied via cyclic voltammetry (Extended Data Fig. 6). The basic elec- 
trode reactions are displayed in Fig. 1b. Upon charging, the TEMPO 
radical is oxidized, forming an oxammonium cation (TEMPO*), 
while the divalent viologen cation (Viol* *) is reduced to a monovalent 
radical cation (Viol ™*). Cyclic voltammetry of P1 reveals a reversible 
redox wave at 0.7 V (versus the Ag/AgCl reference electrode), in 
accordance with literature values for the TEMPO radical’. The violo- 
gen-containing polymer P2 shows quasi-reversible redox reactions at 
—0.4V and —0.8V. Because the second reaction yields a neutral, 
water-insoluble species (Viol®), only the first step was studied in detail: 


Cell voltage (V) 


0.6 


40,000 15,000 


Time (s) 


0 5,000 


Figure 2 | Charge/discharge behaviour. A representative cell voltage profile 
of a pumped 5-cm‘ test cell during constant-current cycling at 40 mA cm * 
with 10 ml of P1 and 15 ml of P2 solution (charge storage capacity adjusted to 
10 AhI! in aqueous NaCl solution (2 mol1~'), 25°C). 
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Figure 3 | Electric performance of the polymer-based REB cell. a, b, The 
capacity, coulombic efficiency, and energy efficiency of a pumped 5-cm’ test 
cell (10 ml of P1 and 15 ml of P2 aqueous NaCl solution (2 mol 17}; storage 
capacity adjusted to 10 Ah ', 25°C) as a function of discharging current 
density (a; charging at 40 mA cm”) and charging current density 

(b; discharging at 40 mA cm”). 


cyclic voltammetry reveals two waves at —0.40 V and —0.53 V for the 
reduction process, and a sharp peak at — 0.45 V and a broader signal at 
—0.38 V for the re-oxidation. This signal split can be attributed to a 
reversible intramolecular association/dimer formation between two 
viologen radical cations’””*, which is supported by ultraviolet-vis- 
ible-spectroelectrochemical studies: upon reduction, a set of absorp- 
tion bands that are characteristic of the formation of viologen radical 
dimers” arises (Extended Data Fig. 7). Additionally, applying a re- 
oxidizing potential restores the initial spectrum, indicating reversibil- 
ity of the redox process. 

We conducted rotating-disk-electrode (RDE) voltammetry to 
obtain further kinetic data (Extended Data Figs 8 and 9); see 
Methods for details of the subsequent analysis. Levich analysis of the 
voltammograms, obtained for a variety of rotation speeds, yields 
diffusion coefficients (D) of (7.0+0.5)X10%cm*s ’ and 
(7.60.9) X10 ’cm’s ' for the polymers P1 and P2, respectively. 
Subsequent Koutecky—Levich analysis for P1 provides mass-trans- 
port-independent currents, which are fitted to the Butler-Volmer 
equation to obtain an electron-transfer rate constant (k°) of 
(45+0.1)X10 “cms ‘. Tafel analysis determines the transfer 
coefficient (x) for P1 to be 0.68 + 0.03, which is fairly close to 0.5, 
the value for an ideally reversible redox reaction. For P2, RDE 
voltammetry yielded analysable data only for high rotation rates, 
and a two-step mechanism is assumed for the first reduction process 
on the basis of the obtained voltammograms. Appropriate analysis 
yields k°=(9+2)X10 °cms ‘. The standard electron-transfer 
rates for P1 and P2 are in the range of common small-molecule 
REB redox-active materials”. 

A battery was built using aqueous solutions of the redox-active poly- 
mers P1 and P2, a cellulose-based dialysis membrane (5 cm” active 
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Figure 4 | Cycling stability of the polymer-based RFB. a, The long-term 
stability of the polymer-based electrolytes was studied by repeated charge/ 
discharge cycling over 10,000 cycles at 20 mA cm” in an unpumped test cell. 
Inset, the open-circuit voltage of a polymer-based RFB as a function of the state 
of charge (static 5-cm’ test cell, P1 (storage capacity of 2 Ah1~') and 

P2 (storage capacity of 4Ah 1’) in aqueous NaCl solution (2 mol 174), 25°C). 
b, A pumped cell was repeatedly cycled at 40 mA cm 7 (10 ml of P1 and 15 ml 
of P2 in aqueous NaCl solution (2 mol1~'); storage capacity adjusted to 
10AhI*, 25°C). 


area), and sodium chloride as supporting electrolyte. With its MWCO 
of 6,000 g mol ', the membrane can effectively retain both polymers 
(with molar masses three times larger than the MWCO). The cell 
provides an open-circuit voltage of 1.1 V and can be safely charged 
and discharged within a voltage window of 0.80-1.35 V; we did not 
observe evolution of oxygen, hydrogen, or chlorine. Because the char- 
ging process is accompanied by a strong colour shift from orange to 
yellow (P1) and ochre to blue (P2), solution colour represents a sim- 
ple indication of the state of charge of the battery (Extended Data 
Fig. 3). Representative charging/discharging curves for constant- 
current cycling at 40 mAcm ” are displayed in Fig. 2. The cell can 
be charged and discharged within the chosen ‘real-life’ voltage win- 
dow at current densities of up to 40 mA cm 7, while retaining most of 
its initial capacity, and achieving an energy efficiency between 75% 
and 80% (Fig. 3). Pulse current densities of up to 100 mA cm ~ are 
possible. At a theoretical capacity of 10 Ah1~', a material utilization of 
up to 82% was observed, which corresponds to energy densities of 
10.8Whl * (charging) and 8.0WhI * (discharging). The observed 
performance approaches conventional vanadium-based RFBs and sur- 
passes all-organic RFBs that use ‘small’ redox-active molecules in com- 
bination with conventional membranes””"*”’. 

Incremented charge/discharge experiments yielded a relatively flat, 
sigmoid open-circuit-voltage curve, which indicates a stable voltage 
between 10% and 90% state of charge and allows us to monitor the 
battery by measuring the open-circuit voltage. Cycling studies at con- 
stant current revealed good stability of the developed polymer-based 
REB in comparison to other organic RFBs presented in the literature 
(Extended Data Table 1). Even after extended long-term tests of 10,000 
cycles in a static, unpumped cell, 80% of the initial capacity was 
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retained (at 20mAcm 7”). Because mass transport relies solely on 
diffusion in this set-up, the rapid charging allowed for a material 
utilization of only 41%. In a pumped cell, higher states of charge were 
achieved at 40 mA cm 7? with a material utilization of 75%. A faster 
capacity fade caused by a side-reaction can be observed. This might be 
induced by oxygen, which slowly enters the electrolyte as a result of 
mechanical abrasion of the tubes in the peristaltic pump causing 
oxidation of the viologen radical cation Viol*” (Fig. 4). 

By combining simple dialysis membranes, which are only 5% to 10% 
of the cost of Nafion, and safe polymer-based aqueous electrolytes, we 
designed an affordable RFB concept. We expect that further perform- 
ance improvements can be achieved by optimizing the process para- 
meters, hardware engineering, and polymer synthesis. Limitations 
resulting from the viscosity of the electrolyte solutions could be over- 
come by substituting linear polymers for (hyper-)branched polymers, 
the latter generally having lower viscosities at higher concentrations 
enabling increased energy densities. Besides the TEMPO and the vio- 
logen moieties, a large number of potential organic redox-active units 
might help to further boost cell voltage and cycling stability of future 
polymer-based RFBs. In any case, the presented work lays the founda- 
tion for a new battery principle, which could lead to the production of 
economical energy-storage devices that use safe, metal-free, and all- 
organic raw materials. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 

Instruments and reagents. All reagents were bought from TCI, Sigma-Aldrich, 
and AlfaAesar, and used as received without further purification. N-Methyl-4,4'- 
bipyridinium iodide was synthesized according to literature procedures using the 
crude product for further reaction steps*®. The polymerization reactions were 
carried out in an inert argon atmosphere. 'H NMR spectra were obtained on a 
FOURIER 300 (Bruker), ATR-IR (infrared attenuated total reflection) spectra on 
an IRAffinity-1 (Shimadzu), and X-band EPR (electron paramagnetic resonance) 
spectra on an EMXmicro CW-EPR spectrometer (Bruker) using powdered sam- 
ples. Asymmetric-flow field-flow fractionation (AF4) (Postnova Analytics) was 
used to determine molar masses*’. The flow behaviour of P1 and P2 (polymer 
concentrations equal a charge storage capacity of about 10 Ah1~') in aqueous 
NaCl solution (1 moll~') was studied on an MCR301 rotational rheometer 
(Anton Paar) at 20°C under continuous shear using a double-gap measuring 
system (DG26.7). 

Representative synthesis procedures. Poly(2,2,6,6-tetramethylpiperidinyloxy-4- 
yl methacrylate-co-[2-(methacryloyloxy) ethyl]trimethylammonium chloride), P1. 
2,2,6,6-Tetramethylpiperidin-4-yl-methacrylate 1 (50g, 222 mmol), [2-(metha- 
cryloyloxy)ethyl]trimethylammonium chloride 2 (53.3 ml, 222 mmol, 80% solu- 
tion in water), and 2-mercaptoethanol (2.5 ml, 35 mmol) were dissolved in dilute 
hydrochloric acid (295 ml, 0.75 mol 17’). After flushing the solution with argon for 
60 min, 4,4’-azobis(4-cyanovaleric acid) (ABCVA) (7.5 g, 22 mmol) was added. 
The reaction mixture was stirred at 75 °C for 6h. Subsequently, the solution was 
cooled to room temperature, and hydrogen peroxide (34ml, 333 mmol, 30% 
solution in water), sodium tungstate (0.85 g, 2.9 mmol), EDTA (0.28 g, 1 mmol), 
and aqueous sodium hydroxide solution (110 ml, 10 wt%) were added. The solu- 
tion was stirred for 48h; additional hydrogen peroxide (34 ml, 333 mmol) was 
added after 24h. Afterwards, the solution was filtered, dialysed against water 
(MWCO = 1,000 g mol’) and lyophilized to yield an orange powder (94). 

The properties of P1 are as follows. M, = 20,200 gmol}; M,, = 33,700 g 
mol !, dispersity D = 1.7 (determined by AF4); 'H NMR (D0, 300 MHz) 6 
(in p.p.m.), 4.65 (s, br, 1H), 4.17 (s, br, 2H), 3.44 (s, br, 2H), 2.90 (s, br, 9H), 
and 2.15-0.25 (m, 26H), (radical quenched by phenylhydrazine); ATR-IR (pow- 
der; in cm !), 3,400 (vb), 2,974 (w), 2,945 (w), 1,721 (s), 1,475 (w), 1,388 (w), 1,236 
(w), 1,145 (m), and 952 (w); g = 2.0074 (determined by EPR); capacity, 39 mA h 
g! (determined by titration) and 37 mA h g (determined by EPR); mean 
hydrodynamic radius, 2 nm (determined by dynamic light scattering). 
Poly(N-4-vinylbenzyl-N' -methyl-4,4’-bipyridinium dichloride-co-4-vinylbenzyl tri- 
methylammonium chloride), P2. 4-Vinylbenzyl chloride 3 (94.5 g, 620 mmol), 
4-vinylbenzyl trimethylammonium chloride 4 (20.2 g, mmol), and 2,2’-azobis(2- 
methylpropionitrile) (3.5g, 21mmol) were dissolved in dimethyl sulfoxide 
(800 ml). After flushing the solution with argon for 60 min, the reaction mixture 
was stirred at 75°C for 6h. Subsequently, N-methyl-4,4-bipyridinium iodide 
(195 g, 620 mmol) was added, and the solution was stirred at 80°C for 48h and 
dialysed against water (MWCO = 10,000 g mol’ *). Ion exchange from iodide to 
chloride was performed by Dowex Marathon A exchange resin. The obtained 
solution was lyophilized to yield an ochre powder (172 g). 

The properties of P2 are as follows. M, = 30,900 gmol |; M,, = 73,400 g 

mol |, D = 2.4 (determined by AF4); 'H NMR (D,0, 300 MHz) 6 (in p.p.m.), 
8.95 (m, 4H), 8.43 (s, br, 4H), 7.75-6.20 (m, 4H + 4H), 5.78 (s, br, 2H), 4.40 (s, br, 
3H), 2.90 (s, br, 9H), and 2.25-0.20 (m, 3H + 3H); ATR-IR (powder; in cm !), 
3,360 (b), 3,005 (m), 2,926 (m), 1,635 (s), 1,558 (m), 1,506 (m), 1,446 (m), 1,352 
(w), 1,217 (w), 823 (m), 796 (m), and 669 (w); capacity, 51 mAh gs (determined 
by charging/discharging test); mean hydrodynamic radius, 2 nm (determined by 
dynamic light scattering). 
Electrochemical investigations. Cyclic voltammetry and RDE voltammetry. 
These were conducted on a VersaSTAT potentiostat/galvanostat (Princeton 
Applied Research) using a standard three-electrode set-up with a glassy-carbon- 
disk working electrode (5mm diameter), a Ag/AgCl/water reference electrode, 
and a graphite rod counter electrode. For RDE voltammetry, the rotation speed 
was controlled externally by a Model 636A ring-disk electrode system (Princeton 
Applied Research). 

For P1, analysis of the RDE voltammograms via a Levich plot (limiting current 
itim Versus «'””, where « is the rotation speed) yields the corresponding diffusion 
coefficient D using the Levich equation jim = 0.62nFAD??@y"/°co, where 
n=1 is the number of transferred electrons per redox reaction, F = 96,485 
Cmol | is Faraday’s constant, A = 0.20 cm? is the area of the electrode surface, 
v=1.01 X 10° m’s  " is the kinematic viscosity of the aqueous sodium chloride 
solution (0.1 moll”), and co is the bulk concentration of the redox-active repeat- 
ing unit of the polymer. Application of the Koutecky—Levich equation 1/i = 1/i, + 
L/ijim yields the mass-transfer-independent kinetic current i,, which is subse- 
quently fitted by the Butler-Volmer equation via a Tafel plot (log(|i,|) versus 
overpotential). This fitting allows us to determine ig (log[i.(0)] = loglio|), and 


consequently k® (via ig = FAK°co) and (via the slope of the Tafel plot: —aF/ 
(2.3RT) for negative slopes or (1 — «)F/(2.3RT) for positive slopes, where R is 
the universal gas constant and T is the absolute temperature). 

For P2, a two-step process is assumed for the first reduction process on the basis 
of the observed voltammetric behaviour. Furthermore, the analysis of RDE experi- 
ments is restricted to high rotation rates, because a layer of an intermediate 
product is observed at low rotation rates. For analysis, the respective EE mech- 
anism (which includes two subsequent electrochemical reactions) following ref. 32 
is applied. Accordingly, for small negative overpotentials, we apply the Levich 
equation and determine D as described above. We analyse the current for small 
negative overpotentials 1, using 1/i= 1/(2ij,,) + exp[mF/(RT)]/ (2FAk°co); 
which yields 1/i(n,) = exp[,F/(RT)]/(2FAK°co) as o 7-30 and log|1/ 
i(, = 0)| = —log|2FAK°co|; we use this expression to determine k°. This analysis 
does not allow us to determine a. 

Spectroelectrochemical experiments. These were carried out in a quartz cuvette 
(optical path length of 1 mm) containing 0.1 moll~' NaCl in water solution, a 
platinum-grid working electrode, a platinum-wire auxiliary electrode and a Ag/ 
AgCl/water reference electrode. The potential was controlled using an Autolab 
PGSTAT30 potentiostat (Metrohm). The redox process was monitored by ultra- 
violet—visual spectroscopy using a Lambda 750 UV-vis spectrophotometer 
(PerkinElmer) and considered complete when there was no further spectral 
change. 
Charging/discharging tests. These were carried out at 25°C using a potentiostat 
(VMP3, Biologic) and an REB test cell (JenaBatteries GmbH; see Extended Data 
Fig. 3): poly(tetrafluoroethylene) (PTFE) frame, ethylene propylene diene mono- 
mer (EPDM) rubber seals, graphite (cathode) and Nickel (anode) current collec- 
tors, graphite felt electrodes (2.25 X 2.25 X 0.4 cm*, GEA6, SGL), and an active 
area of 5cm’. For static experiments, the flow-cell set-up was filled with 3 ml 
electrolyte solution (P1 at 2Ah1~’ and P2 at 4AhI' in 2moll”' aqueous 
NaCl solution) using a syringe and sealed. The effective electrode volume of this 
unpumped cell is about 2 ml. For dynamic experiments, polymer solutions with a 
charge storage capacity of 10 Ah] ' in aqueous NaCl solution (2 moll~') were 
prepared. 10 ml P1 solution and 15 ml P2 solution were transported through the 
cell by a peristaltic pump at a flow rate of 20 ml min~' (Hei-FLOW Advantage, 
Heidolph). The battery was charged/discharged under a constant-current regi- 
men. All electrolyte solutions were kept under an argon atmosphere. The electric 
performance of the polymer-based REB cell was studied in a pumped cell. For 
studying the influence of the discharging current density (20-100 mA cm ”), the 
charging current density was kept constant at 40 mA cm >; an inverse experiment 
was conducted for studying the influence of the charging current density. A long- 
term cycling test was performed by repeatedly charging and discharging a static 
cell at 20mAcm *. The state-of-charge curve was acquired by incremented 
charge/discharge at 5mAcm ” for a time of 10s. A lower cut-off potential of 
0.8 V and an upper cut-off potential of 1.35 V was used for all experiments. For all 
experiments a cellulose-based dialysis membrane with an MWCO of 6,000-8,000 
gmol ', a thickness of 70 um and a pore size <1 nm (Spectra/Por 1, Spectrum 
Laboratories) was used. The membrane was pretreated with deionized water 
before use. 
Membrane characterization. Salt permeability. We determined the salt permeab- 
ility (P,) using a homemade set-up consisting of two chambers (A and B) separated 
by a dialysis membrane (regenerated cellulose, MWCO of 6,000-8,000 gmol |; 
thickness of 70 um; Spectra/Por 1, Spectrum Laboratories)**. Chamber A was filled 
with sodium chloride feed solution (1.0 moll~!) and chamber B with deionized 
water (25 °C). We determined the change of the salt concentration in chamber B 
via conductivity measurements. Subsequently, we calculated the salt permeability 
from the change of the salt concentration over time from two averaged runs using 
= nee = Ds , where V,.z are the volumes of the solutions in chambers A 
CAA dt L 
and B, A is the membrane area, L the membrane thickness, ca, are the salt 
concentrations in chambers A and B, t is time, and D, is the diffusion coefficient 
of the salt. 

We determined the area resistance R using a 5-cm’ test cell. The data represent 
an average of three measurements. We measured the electrolyte resistance (aque- 
ous NaCl, 1 moll”) of the cell with (R;) and without (R,) a membrane using 
electrochemical impedance spectroscopy; R was calculated as R = (R; — R2)A. 
Retention of redox-active polymers. We studied the retention of the redox-active 
polymers in an REB test cell (JenaBatteries GmbH; see Extended Data Fig. 3) at 
room temperature by pumping a 10ml feed solution of P1 (60 mgm”? in 
2.0 moll! NaC]l,,) or P2 (40 mg ml? in 2.0 moll” * NaClaq) through one cell 
compartment and 10 ml of pristine sodium chloride solution (2.0 moll” ') through 
the second cell compartment. Both compartments were separated by a dialysis 
membrane (Spectra/Por 1, Spectrum Laboratories). We analysed the polymer 
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concentration in the second cell compartment by ultraviolet-visual spectroscopy 
using a Lambda 750 UV-vis spectrophotometer (PerkinElmer). 

Maximum polymer permeability. We calculated the maximum polymer permeab- 
ility (Pyolymermax) in accordance with P,. In contrast to permeability data 
for molecules with a definedmolar mass (for example, metal ions, or ‘small’ 
organic molecules), the polymer permeability cannot be expressed with one abso- 
lute value because polymers have a molar mass distribution. P, decreases substan- 
tially as the molar mass of a single molecule increases. Therefore, only Pyolymer,max 
for the smallest polymer molecules of a given distribution can be determined. 
Larger polymer molecules will pass the membrane at a much lower rate; that is, 
they have a much smaller polymer permeability. 

Membrane selectivity. Membrane selectivity (S) describes the ratio of the rates of 
ion transfer of the electrolyte salt (H* for a vanadium-based RFB or NaCl for a 
polymer-based RFB) and the redox-active species (vanadium salts for a vanadium- 
based RFB or P1 and P2 for a polymer-based RFB), according to S = P.naci/ 
Ppolymer,max: In the context of determining a maximum permeability for polymers, 
membrane selectivity is considered a minimum value (S,nin) for polymers. 

Cost calculations. The data on Nafion membrane prices indicated a current 
cost of US$500-1,000 m~ and a projected long-term cost of US$100-200 m~? 
(refs 34-36). Dialysis and nanofiltration membranes are currently available at 
US$20-100 m~? on an industrial scale from manufactures such as Spectrum 
Laboratories, Pall, Microdyn-Nadir, Sartorius and Merck Millipore (for example 
http://www.spectrumlabs.com/dialysis/RCtubing.html). Future expansion in pro- 
duction capabilities and learning-curve effects will further reduce the price (prob- 
ably to <<US$10m*). 

Similar to the membrane material, the electrolyte contributes substantially to 

the cost of RFBs. Polymers are ubiquitous and can be prepared at very low costs on 
a megatonne scale. The prices vary between about US$0.6kg™' for commodity 
plastics (for example, polyvinyl chloride or polystyrene), US$1.2kg~' for engin- 
eering plastics (for example, nylon or polymethyl methacrylate), and US$1.5- 
3kg | for high-performance plastics (for example, polyetheretherketone or poly- 
ethylenimine); http://plasticker.de, accessed July 2015. Redox-active polymers can 
be prepared at a price similar to high-performance plastics on a kilotonne scale. 
Because large photovoltaic and wind farms require megawatt batteries, which use 
tonnes of active material, production numbers will soon rise to a level that allows for 
economical polymer synthesis. Upon further increases in the energy density of the 
polymers and industrial up-scaling of the production, we anticipate a competitive 
price for polymer-based RFBs, which benefit from the less-corrosive electrolyte. 
Dynamic light scattering. Dynamic light-scattering measurements were per- 
formed on an ALV CGS-3 (Malvern) equipped with a He-Ne laser (633 nm) at 
polymer concentrations of 5mgml ' at 25°C. We analysed the experimental 
autocorrelation functions using the CONTIN algorithm. We calculated the appar- 
ent hydrodynamic radii using the Stokes—Einstein equation. 
Toxicity tests. Cytotoxicity studies were performed with the mouse fibroblast cell 
line L929 (CCL-1, ATCC), as recommended by ISO10993-5. The cells were rou- 
tinely cultured in DMEM, supplemented with 10% fetal calf serum, 100 Uml”* 
penicillin, and 100p1gml~' streptomycin (all components from Biochrom), at 
37 °C in a humidified 5% (v/v) CO atmosphere. 

Cells were seeded at 10* cells per well in a 96-well plate and incubated for 24h; 
no cells were seeded in the outer wells. Afterwards, the testing substances were 
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added to the cells at concentrations indicated in Extended Data Fig. 7 (0.25 ug 
ml! to 1 mgm‘) and the plates were incubated for a further 24h. Polymers P1 
and P2 were applied in a diluted aqueous sodium chloride solution as used for the 
charging/discharging tests, and the vanadium salts were used ina diluted sulphuric 
acid solution with concentration ratios commonly used in vanadium-based RFBs 
(1.5 moll! vanadium ion, 3.5 mol| ' sulfuric acid). Control cells were incubated 
with fresh culture medium. Subsequently, the medium was replaced by a mixture 
of fresh culture medium and Alamar-Blue solution (Life Technologies), prepared 
according to the manufacturer’s instructions. After a further incubation period of 
4h at 37 °C, the fluorescence was measured at excitation/emission wavelengths of 
570 nm/610 nm, with untreated cells on the same well plate serving as negative 
controls. The negative control was standardized as 0% of metabolism inhibition 
and referred as 100% viability. Cell viability below 70% was considered indicative 
of cytotoxicity. Data are expressed as mean + s.d. of three determinations. 

The membrane damaging properties of polymers were quantified by analysing 
the haemoglobin release from erythrocytes by a haemolysis assay. Blood from 
sheep, collected in heparinized-tubes (Institut fiir Versuchstierkunde und 
Tierschutz, Friedrich Schiller University Jena), was centrifuged at 4,500g for 
5 min, and the pellet was washed three times with cold 1.5 mmol1l~* phosphate 
buffered saline (PBS; pH 7.4). After dilution with PBS in a ratio of 1:7, aliquots of 
erythrocyte suspension were mixed 1:1 with the polymer solution and incubated in 
a water bath at 37°C for 60 min. After centrifugation at 2,400g for 5 min, the 
haemoglobin release into the supernatant was determined spectrophotometrically 
using a microplate reader (TECAN Infinite M200 PRO plate reader) at a wave- 
length of 544nm. Complete haemolysis (100%) was achieved using 1% Triton 
X-100 serving as the positive control; PBS served as negative control (0%). A 
haemolysis rate less than 2% was taken as non-haemolytic. Experiments were 
run in triplicates and were performed with blood from three different animals. 
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Extended Data Figure 1 | Schematic representation of the synthesis of oxidation and functionalization, respectively. ABCVA, 4,4'-azobis(4- 


redox-active polymers. The cathode material P1 and anode material P2 were __cyanovaleric acid); AIBN, azobisisobutyronitrile; DMSO, dimethyl sulfoxide. 
prepared by free radical polymerization and subsequent polymer-analogous 
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measuring system in a rotational rheometer. The concentrations of the 


Extended Data Figure 2 | Rheogram of redox-active polymers. The flow 
polymers correspond to a charge-storage capacity of about 10 Ah’. 


behaviour of aqueous solutions of P1 and P2 was studied at 20 °C under 
continuous shear in sodium chloride solution (1 moll~') using a double-gap 
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Extended Data Figure 3 | Test set-up for a polymer-based RFB. view drawing of the 5-cm‘” test cell. c, Colour change of the polymer solutions 
a, Photograph of a laboratory set-up (5-cm’ test cell, peristaltic pump, and upon charging and discharging. 
electrolyte reservoirs) used for charging/discharging experiments. b, Exploded- 
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Extended Data Figure 4 | Crossover studies on dialysis membrane. a, Time- 
dependent NaCl concentration of a chamber filled with deionized water that is 
separated from a NaCl feed solution (1 mol1~’) by a dialysis membrane. Salt 
permeability was determined to be P, = (9.3 + 0.1) X 10 °cms ‘.b, Time- 
dependent P1 concentration of an REB test cell compartment filled with NaCl 
solution (2 mol1~') that is separated from a P1 feed solution (60 mg ml‘) bya 
membrane. We determined the maximum polymer permeability from the 
linear part of the diffusion graph to be Pyotymer,max = (3.2 + 0.3) X 10 ’cms 
(diffusion coefficient Dpolymermax = (1.3 + 0.1) X 107 cm? min7!). The 
minimum membrane selectivity Smin = 290. c, Time-dependent change in 

P2 concentration of an REB test cell compartment filled with NaCl solution 
(2 moll‘) that is separated from a P2 feed solution (40 mg ml!) bya 
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Dpolymermax = (1-4 + 0.2) X 107° em? min™; Synin = 2,830. All crossover 
experiments were conducted with a cellulose-based dialysis membrane 
(MWCO = 6,000 g mol!) at 25°C. Inb and ¢, the slopes of the fit lines 
“ in P, ot “a (see Methods). d, The number- 
weighted distributions of hydrodynamic radii ((Rn)napp) of P1 and 

P2 determined by dynamic light scattering reveal mean radii of approximately 
2nm(5gl ‘in 0.1 moll” ' NaCl solution). Inset, a comparison of the intensity- 
and number-weighted distributions of P1 shows the presence of aggregates 
(4.6nm and 184nm). 


dialysis membrane. Pyolymermax = (3.3 = 0.4) X 10 ®cms_}; 


correspond to 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


? 0.020 b 
0.015 0.004 
0.010 

t ¢ 0.002 

> 0.005 = 

Cc Cc 

© © 

5 0.000 5 0.000 

O O 
-0.005 

-0.002 
-0.010 
-0.8 -0.4 0.0 0.4 0.8 08 04 0.0 0.4 0.8 12 


Potential vs. Ag/AgCl / V Potential vs. Ag/AgCl / V 


Extended Data Figure 5 | Cyclic voltammogram of electrolytes after 10,000 
cycles. a, b, The cyclic voltammogram (in water with 0.1 mol 17! NaCl; scan 
rate of 200 mV s~ ') of samples taken from the anolyte (a) and catholyte 


(b) after repeated charging/discharging; solid lines and dashed lines correspond 
to the reductive and oxidative range, respectively, of the cyclic voltammogram. 
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Extended Data Figure 6 | Electrochemical analysis of P1 and P2. a, Cyclic | NaCl). The single-reduced species (Viol *’) is visible at —450 mV (orange line) 
voltammogram of the oxidation process of P1 (2.5 X 10° > moll~! in water with the formation of three distinct bands at 365 nm, 530 nm, and 900 nm, 
with 0.1 moll” * NaCl; scan rate of 200 mV s_*). b, Cyclic voltammogram of the _ which disappear upon further reduction towards the double-reduced species 
first reduction process of P2 (5.2 X 10° * moll! in water with 0.1 moll”'NaCl; —_ (Viol®). The shapes and positions of the emerging bands strongly suggest the 
scan rate of 200 mV s_'). c, Ultraviolet-visual spectroelectrochemistry of formation of radical cation dimers’’. Re-oxidation at 200 mV (dotted line) 

P2 at different applied potentials (as indicated) during consecutive reduction restores the initial spectrum. 

and after subsequent re-oxidation (10 * moll! in water with 0.1 moll”! 
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Extended Data Figure 7 | Toxicity tests of the redox-active polymers. a, The 
viability of L929 mouse cells was tested in the presence of redox-active 
compounds according to ISO10993-5. Cell viability below 70% is considered 
indicative of cytotoxicity. The negative control was standardized as 0% of 
metabolism inhibition and referred as 100% viability. Two vanadium salts in 
redox states commonly found in vanadium-based RFBs and two widely used 
cationic polymers were used as a reference. Poly(L-lysine), PLL, is a commercial 
food preservative and branched poly(ethylene imine), bPEI, is used in the 
paper-making industry and as a flocculating agent. Although P1 and 

P2 show cytotoxic effects at concentrations >50 ug ml '—with P1 being less 
toxic than P2—the vanadium salts and the cationic polymers reveal cytotoxic 
effects at lower concentrations (VOSO, > 5 Lg ml‘, VCl, > 10 ug ml |, 
bPEI>5 ug ml’, PLL> 25 pg ml '). Data are expressed as mean values and 
error bars represent the standard deviation of three determinations. b, We 


quantified the cell-membrane damaging properties of the polymers by 
analysing the haemoglobin release from erythrocytes (indicated by the 
numbers associated with each bar). Data are expressed as mean values and error 
bars represent the standard deviation of triplicates of three different blood 
samples per concentration. Because a haemoglobin release (haemolysis rate) 
less than 2% is considered non-haemolytic, P1 and P2 as well as the reference 
compounds show no membrane damaging behaviour. Hence, the cell toxic 
effects do not originate from damage to the cell membrane, but from reactions 
within the cell. Because the cell uptake via diffusive processes of polymers is 
hindered in comparison to ‘small’ inorganic ions, P1 and P2 possess lower 
cytotoxicity. These tests provide some insight into the toxicity, but long-term 
ecotoxicity tests and animal testing are required to fully evaluate the impact of 
the redox-active polymers on wildlife and plants. 
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Extended Data Figure 8 | RDE measurements of P1 and analysis. 

a, Voltammograms of P1 (2.5 X 10-? mol1~! in water with 0.1 moll~! NaCl; 
scan rate of 5 mV s_') at different rotation speeds (as indicated) from 400 r.p.m. 
to 3,600 r.p.m. (The arrow indicates the direction of potential scanning). 

b, Levich plot from the obtained limiting currents; application of Levich 


equation yields a diffusion coefficient D = (7.0 + 0.5) X 10° cm?s 1. 

c, Koutecky-Levich plot for different overpotentials 7 (as indicated) yielding 
the mass-transfer-independent current i, (as o '?_5 0, =i). d, Tafel plot 
yielding k° = (4.5 = 0.1) X10 *cms | and a = 0.68 + 0.03 (see Methods). 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


$$ 


—— 100 rpm 
—— 200 rpm 
—— 400 rpm 
—— 600 rpm 
~———— 900 rpm 
——— 1,200 rpm 


—— 2,000 rpm 
—— 2,500 prm 
—— 3,000 rpm 
—— 3,600 rpm 
—— 4,200 rpm 
—— 4,900 rpm 


Current/A 


-0.6 -0.4 -0.2 


Potential vs. Ag/AgCl / V 


0.0 


0.0 


-2.0x10* 


-4.0x10* 


(Current)! / A“ 


-6.0x10* 


-8.0x10* 
0.04 


0.06 
(Rotation rate)” /s\?-rad? 


0.08 0.10 


Extended Data Figure 9 | RDE measurements of P2 and analysis. 

a, Voltammograms of P2 (5.2 X 10°? moll! in water with 0.1 moll! NaCl; 
scan rate 5mVs_') at different rotation speeds (as indicated) from 100 r.p.m. 
to 4,900 r.p.m.; substantial changes of the limiting current, necessary for 
reasonable analysis, are observed only at rotation speeds >1,200 r.p.m. 

(The arrow indicates the direction of potential scanning). b, Levich plot for 
currents between the first and second steps of the two-step process; the 
Levich equation was applied only for high rotation rates, that is, in the region 
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of substantial changes of the limiting current, yielding a diffusion coefficient 
D=(7.6 + 0.9) X10 ’cm’s |. The fit curve and its slope correspond to 
the Levich equation (see Methods). c, Plot of i ' versus wo \ for high 
negative overpotentials 1 (as indicated) yielding 1/i, (as wo "0, Wiz = 1/i). 
d, Plot of log| 1/ ix| versus 1]; (overpotential with respect to the first step) 
yielding —log|2FAk°co| (log| 1/i,| for 7, = 0), which allows us to determine 
ko =(9+2)X 10 °cms |. The error bars represent error that originates from 
the linear regression analysis of c. 
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Extended Data Table 1 | (Semi)organic redox-flow batteries 


Current density Energy density 
Anode & cathode Electrolyte [mA cm?] [Wh L} Cycles Membrane Source 
HO, P f Q OH 
Ss Ss. 
B, of Ne) pened 200-500 16 20 Nafion [6] 
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0 07 (CH2CH20)sCHs 
PP 
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N separator 
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see a_a.a-a_a__a_=aaaa.aaa=====_=___._2....5050©&©& SS 
Dialysis 
P1 & P2 NaCl/H2O 40-100 10 10,000 BETS - 


(PC = propylene carbonate; MC = mixture of organic carbonates; PP = polypropylene; PE = polyethylene) 


Typical performance data of literature examples (refs 6-9, 17-20, 37) of organic RFBs that use conventional membranes and electrolytes. 
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Spontaneous droplet trampolining on rigid 
superhydrophobic surfaces 


Thomas M. Schutzius!*, Stefan Jung!*, Tanmoy Maitra!, Gustav Graeber!, Moritz Kohme! & Dimos Poulikakos! 


Spontaneous removal of condensed matter from surfaces is 
exploited in nature and in a broad range of technologies to achieve 
self-cleaning”, anti-icing*° and condensation control”*. But 
despite much progress*-”?"'4, our understanding of the phenom- 
ena leading to such behaviour remains incomplete, which makes 
it challenging to rationally design surfaces that benefit from its 
manifestation’>-!*, Here we show that water droplets resting on 
superhydrophobic textured surfaces in a low-pressure environment 
can self-remove through sudden spontaneous levitation and sub- 
sequent trampoline-like bouncing behaviour, in which sequential 
collisions with the surface accelerate the droplets. These collisions 
have restitution coefficients (ratios of relative speeds after and 
before collision) greater than unity’? despite complete rigidity of 
the surface, and thus seemingly violate the second law of thermo- 
dynamics. However, these restitution coefficients result from an 
overpressure beneath the droplet produced by fast droplet vapori- 
zation while substrate adhesion and surface texture restrict vapour 
flow. We also show that the high vaporization rates experienced 
by the droplets and the associated cooling can result in freezing 
from a supercooled state”! that triggers a sudden increase in 
vaporization, which in turn boosts the levitation process. This 
effect can spontaneously remove surface icing by lifting away 
icy drops the moment they freeze. Although these observations 
are relevant only to systems in a low-pressure environment, they 
show how surface texturing can produce droplet-surface inter- 
actions that prohibit liquid and freezing water-droplet retention 
on surfaces. 

We examine the motion of a water droplet (typical radius Rp + 0.1cm) 
initially resting on a superhydrophobic surface in a low-pressure envi- 
ronment via high-speed imaging (Fig. la, b). The surface used is a 
silicon micropillar array, a standard, well-controlled surface platform, 
treated with a fluorosilane coating; the combination of texture and 


10 ms 


surface chemistry makes the surface superhydrophobic (Fig. 1a, 
inset, and Extended Data Table 1; the pillar diameter d, pitch ] and 
height h are 1.4m, 6.5 um and 4.8 um, respectively). While the 
droplet is resting on this surface, sudden spontaneous droplet motion 
and bouncing is observed from side-view imaging (Fig. 1a) once the 
environmental pressure is reduced—here at a rate of —0.1 bar s~! 
to approximately 0.01 bar (low vacuum; see Supplementary Video 1). 
Figure 1b quantifies the motion by plotting the droplet centroid 
position y against time t. This behaviour is reminiscent of an 
under-damped, forced, mass-spring-damper system operating at a 
resonance condition (see Supplementary Video 2, Methods section 
“Modelling droplet trampolining’ and Extended Data Figs 1 and 2), 
where mass loss from vaporization does not play a role (Methods 
section “Droplet mass loss’). The defining feature of the observed 
behaviour is that each collision between the droplet and the fully 
rigid substrate results in momentum gain, as in the case of a bounc- 
ing trampolinist (although the trampolinist, in contrast to the drop- 
let, interacts with a fully elastic substrate). Similarities also exist with 
the Leidenfrost effect*’, in that both involve a vaporizing droplet 
supported on a vapour cushion with excess pressure caused by a 
small droplet-substrate gap that restricts the draining-vapour flow; 
but although for the Leidenfrost effect the thickness of this gap is 
determined by balancing the droplet weight and pressure forces”, 
in our system it is determined by the surface texture, represented 
by pillar height. 

Figure 2b shows plots of y as a function of t and Fig. 2a the associated 
image sequences for droplets with similar velocities v; impacting the 
same superhydrophobic surface in standard-pressure (approximately 
1.0 bar) and low-pressure (approximately 0.01 bar) environments 
(see also Supplementary Video 3). The variables y and t are non- 
dimensionalized with respect to the initial droplet radius Ro and the 
inertial-capillary timescale t = ./m / 0, respectively, where m is the 


Figure 1 | Droplet trampolining on a rigid 
surface. a, High-speed image sequence showing 
a droplet, initially at rest, trampolining (initial 


y (mm) 


111ms  124ms 139 ms 145 ms 


104 ms 


droplet radius Ro = 0.09 cm). Inset, micrograph 
of the silicon superhydrophobic surface showing 
its pillar texture. b, Droplet vertical position 

y (blue circles, cross-sectional centroid) as a 
function of time t for the image sequence in 

a. The dotted lines in a correspond to y=0 
(Extended Data Fig. 1). For more details see 
Supplementary Video 2. 
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Figure 2 | Vaporization can accelerate droplet recoil. a, Overlaid, 
semitransparent image sequences of droplets impacting and recoiling from 
a superhydrophobic surface under environmental conditions at low (blue 
droplet) and standard (black droplet) pressure. b, Plot of y as a function of t 
(non-dimensionalized with respect to the initial droplet radius Ro and the 
inertial-capillary timescale t = ./m / 0, respectively) for the image 
sequences in a (blue line, blue droplet; black line, black droplet). c, Image 
sequences similar to those in a, but focusing on the contact line region. 

d, Plot of R as a function of ¢ (non-dimensionalized with respect to Ro and T, 
respectively) for the image sequences in c (blue line, blue droplet; black line, 
black droplet). Impact parameters: a, b, v) = —0.9Ro/T; ¢, d, vi = —0.6Ro/T. 
Surface properties, same as Fig. 1. See Supplementary Videos 3 and 4 for 
further information. 


droplet mass and ais surface tension. The data indicate that the impact 
phase, where inertia is important (—v1T/Ro © 1), is largely unaffected 
by pressure; the opposite is true for the recoil dynamics. To better illus- 
trate this, Fig. 2c, d presents plots of droplet-substrate contact radius 
Ras a function of t for low- and standard-pressure cases, and the asso- 
ciated image sequences (see Supplementary Video 4). These plots show 
that the spreading of the contact line is largely unaffected by pressure, 
whereas the recession of the contact line changes profoundly (see Fig. 
2d). Under standard pressure, during the time At,/t (shaded yellow 
region in Fig. 2d), where At, is the duration of the contact-line recession 
interval, R decreases linearly (slope of —0.6) as a result of inertial and 
capillary forces balancing each other, and the rate of decrease is con- 
sistent with that estimated from a previous retraction model‘. In the 
low-pressure case, for the same time region, R decreases parabolically, 
that is, in a continuously accelerating fashion. The acceleration of the 
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Figure 3 | The effect of microtexture on droplet impact and 
trampolining dynamics in a low-pressure environment. a, Plots of ¢ and 
Ap/(oRot) versus —v,(t/Ro) for droplets impacting a superhydrophobic 
surface. Error bars represent uncertainty of the measurement. b, Plot 

of the probability of observing a trampolining event (®) versus the 

force ratio 3Ca*(R / h)’ / sin(6) for small droplets (2Ry < 0.27 cm) on 
superhydrophobic surfaces with similar pillar pitches and diameters, but 
substantially different heights (blue bar, h= 4.8 um; grey bar, h= 10.9 um; 
each bar represents an average of at least ten trials). The uncertainty of the 
force ratio 3Ca'(R/h)?/sin(63) ranges from 17% to 22% (blue bar) and 23% 
to 31% (grey bar) owing to uncertainties in R, h, P J, 6; and temperature. 
Surface properties: a, same as Fig. 1; b, [d, 1, h] =[1.6, 6.5, 10.9] um and 
[1.4, 6.5, 4.8] pm. 


contact-line recession estimated from Fig. 2d is —51Ro/1’, which is 22 
times greater (in magnitude) than the acceleration experienced by the 
droplet overall during the corresponding receding/recoil phase (esti- 
mated from Fig. 2b to be 2.3R,/1”). Thus, this fast acceleration of the 
contact line acts on the droplet only near the substrate. 

The plot of the restitution coefficient ¢ = — v/v, (the ratio of outgo- 
ing and incoming droplet velocities) as a function of v; in Fig. 3a makes 
it clear that low-pressure conditions increase ¢ beyond unity for a wide 
range of impact velocities studied here. The data reveal an inverse 
dependence of ¢ on —vj, and that a balance between added momentum 
Ap and dissipation (and therefore ¢~ 1) is achieved at —v; © Ro/T. In 
Fig. 3a, experimentally determined values of Ap are plotted against v). 
With this data, and by modelling the droplet impact process as a par- 
tially inelastic collision that gives the net change in momentum as a 
result of droplet recoiling (see Methods section ‘Inelastic collision 
model and Extended Data Figs 3 and 4), we can quantify the overpres- 
sure under the droplet and the resultant net force f with the approxi- 


At, & 
mation, Ap = J " fdt ~ f At, where At, is the time during which a net 


force is acting on the droplet (see Fig. 2d) and f is the average force 
over At,. Substituting appropriate values for —v, =0.6Ro/T (the impact 
velocity used in Fig. 2c, d, At,/t = 0.16) yields f ~2.20R,. Therefore, 
for a typical impact velocity occurring during a trampolining event, the 
average force acting on the droplet during the receding phase, which 


results in momentum transfer in the y direction, is estimated to be 
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Figure 4 | Freezing can trigger spontaneous droplet launching from a 
wide range of materials and microtextures. a—c, Image sequences showing 
water droplets solidifying on, and launching from, superhydrophobic 
surfaces in an environment at standard temperature with low-pressure and 
low-humidity conditions. The surfaces used were: a, silicon micropillar 

([d, 1, h] = [2.0, 4.6, 13.5] um; see Supplementary Video 7 for more details); 


greater than that due to surface tension. The corresponding average 
overpressure, with respect to the maximum droplet-substrate contact 
area Rmax = 0.63Ro (Extended Data Fig. 5), is AP ~0.9(20 / Ry) —a 
fraction of the Laplace pressure within the droplet. 

To determine the physical origin of the force f, it is necessary to con- 
sider the interplay between the dynamics of vapour flow and the super- 
hydrophobic surface texture (see Methods section ‘Droplet vaporization’ 
and Extended Data Fig. 6). At low pressure and humidity, the liquid 
droplet has a relatively high vaporization flux J, and the mean free path 
length A is relatively high compared to the characteristic length scales of 
the micropillar array, so slip effects become important (Knudsen num- 
ber, Kn =A/h~1)”*. If the vapour flux emitted from the droplet into the 
micropillar array is high enough and the micropillar height small 
enough, then the vapour will not be able to drain easily through the 
surface texture (despite its open structure) and an overpressure will 
result. At vaporization equilibrium, which occurs very rapidly with 
respect to impact dynamics (see Methods section “Droplet vaporizatiom), 
the quasi-steady-state condition before recession takes place is repre- 
sented by a balance between the internal pressure of the droplet (20/Ro) 
and a resisting pressure due to viscosity. We estimate the magnitude of 
the overpressure by balancing the pressure gradient driving vapour 
drainage with the viscous stress resisting it”>’°, which, with R~ Rp, is 
AP~6Ca (R/h)3(20/R); here the modified capillary number is defined 
as Ca’ = Ju/(4Fp,c) and depends on the vapour viscosity y, a slip factor F 
that depends on Kn and wall geometry, and the vapour density p, (ref. 27). 
Substituting appropriate values yields AP = 4.7(20/R) (ref. 27). The over- 
pressure underneath the droplet that drives the trampolining behaviour 
varies between this value of AP and approximately zero (at the droplet 
periphery); we estimate it to be the average of those two values: 
AP = 2.3(20 / R). This estimate of overpressure is consistent with the 
result obtained from the inelastic collision model, AP %0.9(20 / Ry). 

For the trampolining process to self-initiate, the overpressure due to 
vaporization must overcome substrate adhesion, which is estimated by 
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b, etched aluminium (Supplementary Video 8); c, fluoropolymer- 
carbon-nanofibre composite (Supplementary Video 9). Micrographs of the 
three surfaces are given on the right. In c, thermographic image sequences 
are also shown, which are synchronized with the above optical image 
sequence from side-view (middle row of images) and top-view (bottom row 
of images) perspectives. 


the force ratio 3Ca*(R /h)° / sin(*), where 6; is the apparent receding 
contact angle of the droplet on the superhydrophobic surface. So, for a 
fixed set of conditions, one can readily satisfy the above criterion by 
changing R/h. Such an analysis is a conservative estimate and is more 
accurate for a highly porous surface texture (as most superhydrophobic 
textures are?*), which satisfies the condition dh /l ?<1.Froma scaling 
perspective, this criterion suggests that h should be much less than R 
but still greater than A: 1 < h < R (see Methods section ‘Droplet vapor- 
ization’). Whether or not sufficient overpressure was achieved to induce 
spontaneous levitation in the low-pressure case is shown in Fig. 3b, 
which plots the probability of observing a trampoline event ® versus 
3Ca*(R/h)? / sin(6*). We used two superhydrophobic micropillar 
surfaces with relatively constant pillar diameters and pitches, but with 
substantially different heights, and kept the droplet sizes 2Rp < 0.27 cm 
to minimize gravitational effects. If the droplets are too large, then 
they are likely to oscillate with a frequency that does not correspond 
to their travel time in the air, which results in the droplets impacting 
onto the substrate in an oblate condition. This type of droplet impact 
can result in e< 1 in spite of the fact that the droplet is vaporizing 
(see Supplementary Video 1 and Methods section ‘Droplet oscilla- 
tions’). Although there are inherent experimental deviations, 
1<3Ca*(R/h)’ /sin(@*) is a good predictor of whether or not 
spontaneous droplet trampolining dynamics will occur. The data also 
show that with shorter pillar heights, one can access a regime where 
trampolining dynamics should occur practically every time (® = 1) for 
a given droplet size. This observation makes it possible to design a 
device that generates continuous mechanical motion by coupling the 
droplet to a cantilever (see Methods section ‘Cantilever, Supplementary 
Video 5 and Extended Data Fig. 7). 

Because droplet trampolining dynamics rely on the combined effect 
of droplet vaporization and substrate-liquid repellence, it would be 
interesting to explore whether liquids with higher vapour pressure and 
typically low surface tension can trampoline as well. Although achieving 
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repellence for that class of liquids is outside the scope of this study 
because it requires re-entrant superhygrophobic surface features”™?”, 
Supplementary Video 6 documents relevant behaviour: a water- 
acetone droplet initially residing on a standard silicon micropillar 
superhydrophobic surface spontaneously levitates at a pressure similar 
to that when water droplets levitate, but exhibits only a few bounces 
and no sustained trampolining. This lack of trampolining is because 
the droplet required a relatively large vaporization rate to overcome 
adhesion with the present substrate, so it transitioned promptly to a 
Leidenfrost state. 

A very noticeable effect of the strong vaporization is the high degree 
of cooling experienced by the droplets in contact with the surface, 
which can even cause them to solidify in a recalescent manner (from a 
supercooled state) in their room-temperature (20-25 °C) environment. 
This effect is made possible by the relatively high vaporization flux and 
poor thermal-transport properties of liquid water (see Methods section 
‘Droplet freezing’), and is consistent with previous work”! that has shown 
that evaporative cooling alone can induce a spontaneous recalescent 
freezing at the free surface of a sessile droplet at the same supercooled 
temperature as the flow. Furthermore, it is has been shown” that the 
rapid recalescent partial solidification of supercooled sessile droplets 
is accompanied by a sudden increase of the droplet temperature to its 
equilibrium value for freezing (0 °C); because the environmentis severely 
undersaturated with respect to water vapour at this temperature, a sud- 
den (explosive) increase in vaporization from the droplet surface occurs. 
These effects manifest themselves in the present work by rapidly increas- 
ing the overpressure under the droplet, which has substantial implica- 
tions for the ensuing droplet dynamics with respect to ice levitation. 

As shown in Fig. 4a and Supplementary Video 7, a sudden overpres- 
sure increase between the droplet and substrate, owing to increased 
evaporation as a result of droplet freezing, is capable of launching it 
away from the surface at the formative state of icing (see Extended Data 
Table 2). The phenomena reported in this study are inherent to droplet 
interactions with textured superhydrophobic surfaces. To underpin 
this statement, we also demonstrated spontaneous ice levitation on 
metallic (aluminium) surfaces (see Supplementary Video 8). Figure 4b 
shows an image sequence of a water droplet freezing and self-levitating 
from a superhydrophobic, etched aluminium substrate (see inset for a 
micrograph of the surface), demonstrating a behaviour identical to that 
of the silicon pillar surface of Fig. 4a. Finally, Fig. 4c shows an image 
sequence of freezing-driven levitation for a droplet initially in contact 
with a superhydrophobic, polymer nanocomposite surface (see inset 
for a micrograph of the fluoropolymer-carbon-nanofibre surface; see 
Supplementary Video 9 for a video of the image sequence). To quantify 
the temperature field during the recalescence phenomenon as it trig- 
gers ice levitation, we recorded a synchronized thermographic image 
sequence from side-view and top-view perspectives showing how the 
temperature of the droplet evolves throughout the freezing process. 
Initially the droplet is at room temperature; it becomes supercooled 
owing to vaporization; it then undergoes recalescent freezing (lasting 
approximately 10 ms), which results in a temperature increase as freez- 
ing rapidly spreads along the droplet surface, and triggers its almost 
immediate levitation. This unexpected ice-levitation process stems 
from the physics of recalescent freezing, in which the ice nucleation 
from a supercooled state, the freezing front propagation and the asso- 
ciated vaporization increase are all directly related to the removal of the 
as-formed ice from a superhydrophobic surface. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Droplet trampolining and imaging. The environmental chamber consisted of an 
aluminium-based pressure vessel connected to a vacuum pump, a pressure sensor 
and a pressurized nitrogen reservoir (see Extended Data Fig. 8). Temperature 
and humidity sensors were located at the centre of the vessel. Two transparent 
PMMA (poly(methyl methacrylate)) windows, one fitted on the front and one 
on the rear part of the vessel, facilitated the use of a high-speed optical visu- 
alization in combination with backlighting, which is arranged in line with the 
lens of the camera, to visualize droplet trampolining. For each experiment, a 
1—10-ul deionized water droplet was placed on the sample surface, and then 
the environmental pressure at the inner volume of the vessel was reduced using 
a valve connected to the vacuum pump ata rate of —0.1 bar s_’. Throughout the 
experiments, the ambient pressure and temperature inside the vessel were kept 
constant at chosen values (approximately 0.01 bar and 20-30°C, respectively). 
Prior to initiating an experiment (that is, reducing environmental pressure), the 
entire pressure chamber was purged with nitrogen to ensure a dry environment. 
The droplet dynamics were captured by high-speed video recording at 500-50,000 
frames per second. The same procedure was followed for the droplet-cantilever 
experiments. The steel cantilever had a width, length and thickness of 1.2 cm, 
5.1cm and 80 um, respectively. 

To understand whether or not a droplet will trampoline on a given surface, we 
provide the following example. For a surface with h ~ 10 um, one can obtain infor- 
mation on the droplet sizes for which trampolining will occur from Fig. 3b. 
We estimate the size of the droplet-substrate contact radius R at which 
trampolining is expected to occur. For a surface with h = 10.9 um, 6% = 154° 
(Extended Data Table 1) and a vaporization flux of J=6.9 x 10-4 gcm™*s"! 
(Extended Data Fig. 6), to achieve 3Ca*(R/h)° /sin(6) = 3—the condition where 
trampolining is more likely to occur for 2Ro < 0.27 cm—then Ca” = 1.5 x 10-6 and 
R/h=67. The latter condition leads to R= 0.73 mm. For such a contact radius, the 
typical value of the initial droplet radius is then Rp = 1.34 mm. 

Vaporization rate. We determined the vaporization flux of a droplet in con- 
tact with a superhydrophobic silicon micropillar surface ({d, J, h] =[1.4, 
6.5, 18.2] um) as a function of environmental pressure in a dry environment 
by determining the cross-sectional area of the droplet in time using ImageJ soft- 
ware (see Methods section ‘Droplet vaporization’). For the low-pressure con- 
ditions of this study, we determined the vaporization flux for millimetre-scale 
droplets on a superhydrophobic surface with a solid/air fraction of ¢ =0.04 to 
be 0.69mgcm~’s"1. 

Silicon micropillar surface. A polished boron-doped p-type (100) silicon wafer 
with areal dimensions of 1.5 x 1.5 cm? and a thickness of 500 + 25 1m was used 
as the substrate. Photolithography was performed using an AZ 1505 and an AZ 
6612 positive photoresist with a Karl Suss MA6 mask aligner; material removal 
was performed using inductive-coupling plasma etching with SF¢/C4F,4 (Bosch 
Process in Alcatel AMS 200 machine) to fabricate a regular and well defined 
micropillar surface structure. To lower the surface energy of the textured surface, 
rendering it hydrophobic, a layer of 1H,1H,2H,2H-perfluorodecyltrichlorosilane 
was applied by liquid-phase self-assembly. For full details on the characteristics 
of these surfaces (for example, geometry, wettability and so on), see Extended 
Data Table 1. 

Etched aluminium surface. To generate the superhydrophobic aluminium surface, 
we used a procedure inspired by ref. 31. An aluminium substrate with areal dimen- 
sions of 2 x 2cm? (weight fractions: Al, 99.58%; Si, 0.1%; Fe, 0.12%; Cu, 0.03%; 
Mn, 0.02%; Mg, 0.02%; Zn, 0.03%; Ti, 0.02%; Ga, 0.03%; and V, 0.05%. Bronmetal, 
AW 1085 from Bronmetal) was initially cleaned under sonication in acetone, iso- 
propyl alcohol and deionized water for 10 min each. Subsequently, to remove 
the native oxide layer, the substrate was treated with a 1 wt% sodium hydroxide 
solution for 10 min. Thereafter, the aluminium substrate was etched with a 1M 
ferric chloride solution for 25 min at 50°C. During the etching step, the aluminium 
substrate was cleaned with isopropyl alcohol for 2-3 min to avoid precipitation of 
ferric hydroxide on the surface. Finally, to impart hydrophobicity, the etched alu- 
minium surface was treated with a 1.43 mM solution of trichloro-1H,1H,2H,2H- 
perfluorodecylsilane in n-hexane for 2 h followed by heating for 45 min at 120°C. 
For details on the wettability and morphological characteristics of these surfaces, 
see Extended Data Table 1. 

Polymer nanocomposite. To generate the superhydrophobic polymer nano- 
composite coating, we used a procedure inspired by ref. 32. To begin, an aqueous 
fluoroacrylic copolymer dispersion (PMC, 20 wt% in water; Capstone ST-100, 
DuPont) was diluted with acetic acid (0.4 wt% PMC in acetic acid/water); sepa- 
rately, a suspension of carbon nanofibre particles (CNF; diameter ~ 100 nm and 
length © 20-200 um, >98% carbon basis; Sigma Aldrich) in acetic acid was gen- 
erated (CNF 2.0 wt% in acetic acid). Both solutions were separately subjected 
to 30 min of ultrasonic probe sonication (Vibracell VCX 130, 130 W, 20 kHz). 
The CNF and PMC dispersions were then combined and mechanically mixed at 


room temperature to generate the final dispersion. The final solid weight ratio of 
PMC to CNF was 1:5; the total solid concentration (CNF + PMC) in the dispersion 
is 1.1 wt%. The PMC-CNF dispersion was then probe-sonicated for 30 min. Finally, 
the dispersion was spray-deposited with a siphon-feed air brush onto standard glass 
slides from a distance of approximately 10 cm and the coatings were placed on a 
hot plate (approximately 100°C) for several minutes to facilitate the removal of 
residual solvents. The morphological and wettability characteristics of the surface 
are shown in Extended Data Table 1. 

Surface characterization. For surface morphology characterization, we used 
a scanning electron microscope (Zeiss ULTRA 55); we applied no conductive 
coatings to facilitate imaging. We performed advancing and receding con- 
tact angle measurements with a backlit image acquisition setup (goniome- 
ter) consisting of a syringe pump (for dispensing and withdrawing volume of 
the droplet on the substrate) and a detector (Thorlabs, DCC1645C) affixed 
with a standard zoom lens (Thorlabs, MVL7000) for the purposes of droplet 
visualization. 

Modelling droplet trampolining. When the droplet is in contact with the sub- 
strate, the droplet trampolining behaviour can be described as a standard mass- 
spring-damper (MSD) system with a forcing function as depicted in Extended 
Data Fig. 1 (for further details, see Supplementary Video 2 and Fig. 1). When the 
droplet is not in contact with the substrate, it is governed by projectile motion. 
Therefore, we describe the dynamics of this entire system as 


Y=Yo+vot—5 at? if y>0 


2 
mF cL + fly) = fg if y<0 (1) 


where m is the droplet mass, y is the vertical position of the cross-sectional cen- 
troid of the droplet, tis time, c is the damping constant, f(y) is the force due to the 
‘stiffness’ of the droplet, f(t) is a forcing function, v is velocity and gis gravitational 
acceleration (zero subscripts denote initial values). Selecting appropriate scales, 
we transform equation (1) into non-dimensional form 


y =o + vot" — Bott"? if y* >0 
d*y* 
dt*2 


dy* x yk wp ok (2) 
+20 a + fe =f*()-Bo if y* <0 


where Bo = mg/oRp is the gravitational Bond number, Ro is the initial droplet 
radius, o is surface tension, 2¢ = c / Jom is the damping ratio and 7 = ./m / ois 
the inertial-capillary timescale. We write the dimensionless variables (indicated 
by asterisks) as 


yo =Roy™ 
t=tt* 
v=v"Ry/t 

fe = OR fe 


f)=ORyf*(*) 


We distinguish three cases: 0< (<1 (under-damped), (= 1 (critically damped) 
and ¢> 1 (over-damped). In the traditional MSD system, the critically damped 
and over-damped cases have no harmonic component in their solutions and, 
therefore, they are not relevant to the problem here. In our study, we estimate that 
¢=0.11 (see Methods section ‘Inelastic collision model), indicating that we have 
an under-damped condition; this is the value used to illustrate the phenomenon 
for the duration of this section. 

As was mentioned above, each trampolining cycle is governed by both MSD 
dynamics (y" <0) and projectile motion (y* > 0); however, it is the forcing-function 
frequency matching the MSD frequency—and the fact that this force only acts 
when y* <0—that is responsible for the trampolining behaviour. To fully repre- 
sent the droplet trampolining as a MSD system, a representative model of droplet 
stiffness needs to be developed. 

We develop a comprehensive model for stiffness that takes into account the 
change in shape of the droplet as it deforms. Previous approaches have provided 
simplified and rigorous descriptions of droplet elasticity!***-*°, Our aim is to derive 
a simple model describing the force due to surface tension, as a function of droplet 
deformation in the direction parallel to droplet transport, where the force causing 
deformation is due to inertia. 
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If one assumes that at the moment of maximum deformation the droplet takes 
the shape of an incompressible flat cylinder of radius W, height H and volume V, 


then fi,’ is 
#9 1 1 
fr " (+1) (yt +1)? 


where y=(V/(21))'/3y", which we refer to as the ‘cylinder stiffness model. For the 
theoretical cylinder geometry, we use an effective radius Ro from the volume rela- 
tion V= 2nR} , which leads to y= Roy" (2W = H=2Ry in the undeformed case). 
The above equation implies that f* depends nonlinearly on y’, indicating that the 
droplet behaves like a spring with variable stiffness. 

The final variables required to solve equation (2) are the initial velocity vo, the 
initial droplet position yo (typically zero) and the expression for f(t). For the case 
where no force is applied, that is, a droplet is impacting a non-wetting surface 
with negligible force generation from evaporation effects, f(t) =0. If f(t) #0, that 
is, a droplet is impacting a non-wetting surface under low-pressure and humidity 
conditions that drive force-generating evaporation, then we define the force as the 
piecewise-continuous function 


* <0, y(t)<0 
F if 2>0, y(t)<o 


where f is the average value of the force applied to the droplet during the recoiling 
stage of the droplet impact process. The above equation states that a force is only 
being applied to the droplet, in a positive direction, once the droplet is in the 
recoiling stage of the impact process. 

Extended Data Figure 2a presents a plot of a full solution for equation (2) com- 
pared against experimental data. It is clear that there is a marked deviation between 
the experimental and theoretical results in terms of projectile motion for the first 
levitation sequence, which is due to the choice of vj(t* = 0): in the experimental 
case, in the first few bounces, although the droplet does not reach the theoretical 
maximum heights, it stores a substantial amount of energy in the form of inherent 
capillary waves from its motion, which ultimately contribute to the lower values 
of Drie This is demonstrated by the inset of Extended Data Fig. 2a, which shows 
an image of an elongated water droplet at the moment of impact at t/t = 2.2. This 
same behaviour is true for successive impact sequences where t/t < 20. (Fig. 1 
shows images of elongated water droplets at the moment of impact that correspond 
to the sequence shown in Extended Data Fig. 2a). For higher droplet impact veloc- 
ities, the role of capillary waves in storing kinetic energy is reduced, as shown at 
later bounces in Extended Data Fig. 2a, where the droplet bouncing heights are 
noticeably higher and the theoretical and experimental cases match very closely. 
Extended Data Fig. 2b shows the force profiles required to induce trampolining 
behaviour for the theoretical case in Extended Data Fig. 2a. The magnitude of the 
average force acting on the droplet during the recoiling stage of droplet impact is 
of the same scale as the force due to surface tension; this is result is further validated 
in Methods section ‘Droplet vaporization. 

By capturing the magnitude, direction and timing of force application on a 

simple, under-damped MSD system with a spring that has variable stiffness, we 
have reproduced the experimentally observed behaviour relatively well considering 
the model simplicity, particularly the periodic, quasi-steady-state condition. This 
outcome underpins the claim that the trampolining behaviour of the droplet can 
be described by an under-damped, forced, MSD-projectile system operating at 
resonance. 
Droplet mass loss. One hypothesis for a droplet impact event resulting in 
€=—V,/Vv2> 1 (ratio of recoiling and impacting velocities) that must be consid- 
ered is that the droplet loses an appreciable amount of mass during the time of 
the impact event—that is, the impacting masses of the droplet in two successive 
periods are not equal (m2#m)). Assuming negligible momentum loss due to vis- 
cous dissipation in the air, we estimate the expected value of ¢ due to mass loss 
to be em /mp; therefore, to achieve a typical restitution coefficient of e= 1.24 
(m, — m)/m, = 0.2—that is, the droplet should shed 20% of its mass during one 
trampolining period. For the experiments performed in this study, we estimate the 
mass loss of a droplet during a single trampolining cycle as 


m—m, _ 3JAt 
m, pR, 


where R; is the initial (at the start of the trampolining cycle) radius of the droplet, 
J is the vaporization flux of the droplet, At is the time it takes to complete one 
trampolining cycle and p is the density of the droplet. In Methods section ‘Droplet 
vaporization, the vaporization flux of the droplet on a superhydrophobic surface 
is measured to be J~6 x 10°-4gcm~*s~!; Extended Data Fig. 2 shows that At~5r. 
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So, for a water droplet with Ry ~0.1.cm (m, = p(4 / 3)7R?), we calculate the 
inertial-capillary timescale to bet = ./m / 0 =7.6 ms, where cis surface tension 
(p =1.0 gem *, o=72 Dyncm~"). Substituting the above values yields 
(m,—m2)/m,~7 x 10~4, that is, the droplet sheds only 0.07% of its mass during 
a single trampolining cycle, which indicates that the experimentally observed 
vaporization flux is about 290 times less than that required to induce trampolining 
by the mass-loss mechanism. 

Inelastic collision model. For the inelastic collision model presented within the 
main text (Fig. 3), one of the most important unknown parameters is the damping 
ratio (, which is a measure of energy dissipation as a function of impact velocity. We 
estimate an average value of ¢ for the superhydrophobic surface and impact veloci- 
ties of interest to this study. To place this value of (into context, we compare it with 
values obtained in a previous study”” for an ultralow-hysteresis superhydrophobic 
surface, which is a good measure of the absolute reachable lower limit for ¢. We 
then determine a reasonable estimate of the additional momentum (per cycle) 
required to sustain droplet trampolining. 

The damping ratio can be estimated experimentally by knowing the coefficient 
of restitution ¢= —v,/v;, the impact velocity v and the droplet-substrate contact 
time f,, and is defined as ¢=0.5(1 —)t/t., where r= ./m / is the inertial- 
capillary timescale, m is the droplet mass and @ is the coefficient of surface tension. 
For the case of a droplet impacting a surface with negligible contact angle hyster- 
esis at relatively low impact speeds, Richard and Quéré! suggest that the maximum 
value of restitution should be e= ./5 / 6 ~ 0.91, owing to kinetic energy being 
converted to vibrations. Because we are tracking the translational motion of the 
droplet, if kinetic energy is converted into droplet oscillations, then effectively that 
will appear as a loss energy in terms of lower droplet speed, particularly for the 
cases where the droplet spends a large amount of time in the air and the oscillations 
have sufficient time to dissipate. 

Extended Data Figure 3a is a plot of ¢ versus v, for two different surfaces: 
(1) the surface used in this study (cos(6; ) — cos(6,') = 0.14); and (2) and an ultra- 
low-hysteresis superhydrophobic surface (cos(6;") — cos(8;') = 0.02; ref. 19). (Here 
and 6,'are the receding and advancing contact angles, respectively.) For the sur- 
face used in this study, the average value of the restitution coefficient for the range 
of impact velocities of interest is e=0.71—much smaller than the theoretical limit 
(owing to much larger contact-angle hysteresis). For droplet impact experiments 
in a low-pressure environment, contact-line pinning is not observed because the 
dewetting process occurs almost instantaneously with respect to the recoiling of 
the droplet (see Supplementary Videos 3 and 4); therefore, one should expect that 
for such a superhydrophobic surface with relatively higher contact-angle hystere- 
sis, the value of ¢ that is measured in an environment at standard pressure would 
be an overestimation for the same experiment in a low-pressure environment (see 
Extended Data Fig. 3b). Furthermore, because contact-line pinning does not occur 
in a low-pressure environment, the impact behaviour of a droplet on a superhy- 
drophobic surface should tend towards the ideal case described in ref. 19 (¢+0.91). 
In this case, if the droplet contact time is the so-called minimum contact time 
(t, /t=J6n / 4~ 1.09; Extended Data Fig. 4), which is defined as the lowest- 
order oscillation period for a spherical droplet, then one should expect (~ 0.04, 40% 
of the average value determined experimentally for the ambient case ((~ 0.11; see 
Extended Data Fig. 3b). Using the magnitude/range of ¢ estimated from the ambi- 
ent case and the theoretical considerations, the positive change in momentum as 
a result of the droplet impact event, Ap, is determined as a function of v; (Fig. 3a). 
For the purposes of conservatively estimating Ap and modelling the process as a 
MSD system, we chose to use the upper value of = 0.11 throughout the manu- 
script. Choosing the lower value would result in a pre-factor adjustment to the 
estimation of Ap; however, it would not change the order-of-magnitude estimate. 

We write an approximate definition of Ap as 


Ap T 
Mi 
oRyt Ry 


At 
where Ap = 1. " f dt is the positive change in momentum (from a force f being 


applied while the contact line recedes), t, is the droplet-substrate contact time 
and At, is the time where a net force is acting on the droplet (see Fig. 2d). By 
knowing ¢, (and f, as functions of v, for droplet impact experiments performed 
ina dry low-pressure environment, we determine Ap—as is done in Fig. 3a—and 
therefore estimate f. 

We determined Ap asa function of v;, and then an average force, by re-writing 
the definition of Ap as Ap = A _ fdt = fAt,. Substituting appropriate values 


(—v; =0.6Ro/t, Ap =0.350Rot, At, = 0.167) yields f ~ 2.20Ro. (It is instructive to 
compare the magnitude of this force with the MSD model in Extended Data Fig. 2, 
which estimated that the average force acting on the droplet during a similar recoil 
phase of the droplet impact process (—v; =0.5Ro/t) was f ~ 0.80R for At, =0.417; 
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therefore, Ap =0.330RoT, which compares well with the above value.) This force 
can be transformed into pressure by using the maximum droplet-substrate contact 
area(AP =f / (R,,,)); the desired parameter, Rmax, is shown in Extended Data 
Fig. 5 as a function of v;, with its behaviour being consistent with that reported 
previously*”. Substituting appropriate values yields AP = (2.20R,) / (1R7.,,) oF 
AP = 0.9(20 / Ro) (f © 2.20Rg and Rmax=0.63Ro), which represents a fraction of 
the Laplace pressure in the droplet. 

Droplet vaporization. For droplets evaporating in a low-pressure, low-humidity 
environment when in contact with a superhydrophobic surface, it is possible to 
estimate the rate of evaporation by measuring how the cross-sectional area changes 
in time. If we assume that the cross-sectional area measured is equivalent to that 
of a sphere cross-section (negligible gravitational effects, Bo = mg / (Ryo) <1; 
ideally non-wetting, 6” = 180°), we determine an effective radius as a function of 
time, R.(t). (Here 6” is the apparent contact angle that the droplet forms with the 
substrate.) The rate of change of radius is then used to determine the vaporization 
flux. Using the liquid density p, we obtain the vaporization flux 


dR, 
dt 


Jp 


Extended Data Figure 6 shows the values of J for millimetre-scale water drop- 
lets on a superhydrophobic surface with a similar wetting fraction to the surfaces 
used throughout this study (¢ = 0.04), as a function of environmental pressure. 
It is clear that for environmental pressures below 0.1 bar, the vaporization flux 
increases markedly. We also found that for the early stage of droplet evaporation 
(up to about 30 s), the vaporization flux of the droplets does not change appreci- 
ably, which justifies its treatment as a constant value for the subsequent analysis. 

Now that the vaporization flux has been estimated, we consider the behaviour 
of vapour flowing into the micropillar array. The Knudsen number determines 
whether or not a gas can be treated as a continuous medium, which is important 
for this problem owing to the low-pressure environment and the small feature sizes 
of the micropillar array. It is defined as the ratio of the mean free path A to a char- 
acteristic length scale of the system. Here, because the region of interest is within 
the micropillar array, the characteristic length scale should relate to the micropil- 
lar length scales. For the purposes of understanding the main mechanism of how 
pressure is distributed beneath the droplet, we choose to treat the flow there as a 
one-dimensional channel, which is possible if the surface satisfies the criterion 
dh / 1? <1, that is, it contains sparsely spaced pillars with small diameters. The 
most important length scale is the height of the pillars h, which defines the scale 
of the gap in which the vapour flows; therefore, Kn=A/h. The mean free path is a 
function of pressure, and is determined by 


_ AT 
/2nd2P 


where kg is the Boltzmann constant, T is the temperature, d, is the effective diam- 
eter of the vapour molecule (Lennard-Jones parameter) and P is the absolute 
pressure. We estimate the mean free path as A= 12 um (kg=1.38 x 10° erg K |; 
T=273K; d,=2.64 x 10~8cm (ref. 38); P=0.01 bar) and, with a pillar height of 
h=4.8um, Kn=2.5. Because the vapour pressure of water under standard con- 
ditions is 0.032 bar, even if a region of the chamber is at saturation conditions, the 
value of Kn computed above should be similar. 

For a steady-state, incompressible flow in a one-dimensional channel (of channel 
height h) in the x direction with slip-velocity boundary conditions, the volumetric 


flow rate (in units of cm? s~!) is?”*? 
1+ {[ z 7 ;| 
vy h (3) 


where (1 — Y/) represents the fraction of molecules colliding with a wall that are 
reflected*’. The second term in the square brackets represents a modification due 
to slip effects, and it is customary to define 


h? OP 
12u Ox 


F=1+12Ke{ 5 — 7 (4) 


For air flowing through round glass tubes and Kn < 1.0, Brown et al.*’ suggests 
that Y= 0.84. By taking this value and Kn = 2.5 and substituting into equation 


(4), we see that F= 43. Hence, the volumetric flow rate (equation (3)) is 
Q=- aor. If the pressure drop is linear from the centre of the droplet to the 
edge, we estimate on ~ — AP / R. Substituting and rearranging yields 

IX 


1?” 


WF (5) 


We define Q in terms of J by accounting for the geometry of the droplet and the 
underlying substrate and defining an average velocity within the channels 7. From 
conservation of mass, we see that 7, = (J / p )(R / (4h)), where py is the vapour 
density (found from the ideal gas law, p = P(M / R)(1/ T), with M denoting the 
molar mass and R the universal gas constant) and R is the contact radius between 
the droplet and the substrate. The volumetric flow rate (in units ofcm? s~') is then 
Q=i,h; therefore, equation (5) becomes 


h’Fp, (6) 


By assuming that R~ R and substituting appropriate values (u= 1.0 x 10-4 P; 
R=0.050cm; J=6.87 x 10-4 gcem~? 5}; h=4.8 x 10-4cm; F=43; py=7.94 x 10-6 
gcm*), we see that AP ~4.7(20/R). On the basis of this, the average overpressure 
under the drop is AP 2.3(20/R).This value is markedly larger than the average 
pressure rectified into the vertical motion of the droplet (see Methods section 
‘Inelastic collision model, AP ~ 0.9(2a / Ro)), which shows that this overpressure 
is the origin of the force driving the trampolining behaviour. 

One important aspect of the behaviour of the droplet is how strongly the pres- 
sure difference depends on pillar height and droplet radius. Taking R © Ro (contact 
radius and droplet radius are similar), and non-dimensionalizing the pressure 
difference from equation (6) against Laplace pressure, yields 


2 3 
art abe (3) =sca(*| 
20 Fuh h 


where Ca = “v4 — _# (2 
o 4p,o*h 


(7) 


) and Ca* = J! _ represent the capillary and modified 
4F po 


capillary numbers for this problem, respectively. This equation shows that for 
droplet trampolining to occur, for a given droplet size (for example, R~0.1 cm), 
the pillar height should be about two orders of magnitude less than the radius of 
the droplet to ensure that a sufficient pressure difference can build up. Furthermore, 
this analysis holds when the vapour flow can be treated as a continuum; therefore, 
for these specific low-pressure conditions, h > 12 um (Kn < 1). So, for a typical 
millimetre-scale droplet, a pillar height of about 10 um should be sufficient to 
promote trampolining behaviour. For this process to be spontaneous, the pressure 
difference should be sufficient to overcome the force due to adhesion, which for a 
superhydrophobic surface is f_ = 2nRosin(@;), where 6; is the apparent receding 
contact angle of a droplet on a surface. By recalling the definition for the pressure 
difference (AP = 3uR?J/(h3Fp,)), averaging it and projecting it onto the contact 
area of the surface, we determine the force due to vaporization as 


4 
f, =APaR? & sill 
” 2h? Fp 

¥ 


By taking the ratio of the two forces, we develop an approximate criterion for the 
initiation of trampolining behaviour 


re _ sin(@*) 


This criterion is similar to equation (7), with the exception of a1 / (2 sin(6;)) 
term (for hydrophobic surfaces, 1 / sin(;*) > 1). So, a simple, order-of-magnitude 
design rule for trampolining dynamics on superhydrophobic micropillar arrays 
can be summarized as: for a droplet of a given radius Ro that forms a contact radius 
with a substrate of RA<h< R. 

We use this analysis to determine the so-called drainage timescale ty. If this 
timescale is much smaller than the timescale of droplet impact and recoil 7, then 
we can treat the drainage process as a steady-state problem and the estimated value 
of AP will be a good approximation. This timescale is estimated as the ratio of the 
length traversed to the average gas velocity, tr; ~ R / u,. Substituting appropriate 
values yields tg/t = 0.003; therefore, the drainage process is two orders of magni- 
tude faster than the natural time-scale of the system (impact and recoil). It is also 
instructive to compare 1, with the crashing (impacting) time of the droplet, 
Ty & —Ro/v;. Substituting appropriate values yields tg/t, = 0.004 (Ro = 0.1 cm; 
—v,=19.7cms~!); therefore, the drainage process is over two orders of magnitude 
faster than the crashing timescale of the system. 

Droplet oscillations. If the droplet oscillation and the time that the droplet is in 
the air are not synchronized, then the droplet can impact the substrate in a (for 
example) sufficiently oblate condition. This can result in an inefficient trans- 
fer of surface energy to kinetic energy, as well as premature dewetting of the 
substrate, causing the restitution coefficient to be less than unity, as shown in 
Supplementary Video 1. The premature dewetting results from an enhanced 


f, 3Ca* (2) 
h 
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droplet-substrate contact area—causing a relatively high overpressure—which 
forces the droplet from the surface at the moment of maximum droplet deforma- 
tion, reminiscent of so-called ‘pancake bouncing’. Such behaviour can be achieved 
by using droplets with diameters that approach the capillary length Lap, because 
the time that the droplet is in the air is never long enough to fully dissipate its 
oscillations (Supplementary Video 1). Also, for early trampolining dynamics, adja- 
cent cycles have periods with differing lengths, because the droplet continuously 
spends more and more time in the air (jumping higher and higher)—as shown by 
Fig. 1—so a situation where the free oscillation frequency of the droplet, which 
is a fixed quantity, becomes out of phase with the time the droplet spends in the 
air may occur. 

Large droplets (h < R= L,.,) are likely to oscillate with a frequency that does 
not correspond to the time that the droplet is in the air, resulting in droplets 
impacting the substrate in an oblate condition. This type of droplet impact can 
result in ¢ < 1 in spite of the fact that the droplet is vaporizing; however, the robust- 
ness of the trampolining phenomenon is also apparent in Supplementary Video 1: 
when a droplet impacts with e < 1 the process is subsequently shown to recover 
and return to bouncing with e > 1. 

Cantilever. Because the phenomenon of droplet trampolining has a natural fre- 
quency (of the order of 1/t) and a predictable force (about f ), we can further 
quantify and potentially exploit its manifestation for the purposes of inducing 
continuous motion in a simple device for mechanical power production. To 
demonstrate this, consider the image sequence in Extended Data Fig. 7a and the 
associated plot of beam position 6 as a function of tin Extended Data Fig. 7b, where 
a single droplet is attached to a thin, metallic, cantilever beam, and it impacts a 
superhydrophobic surface in a cyclic manner. This system generates continuous 
sinusoidal motion for about 400 cycles, after starting spontaneously from rest, 
compared to about 3 cycles for standard pressure conditions, which required an 
initiation pulse (see Supplementary Video 5). Localizing and harnessing the tram- 
polining behaviour under the cantilever beam, the power of this phenomenon is 
visualized in terms of continuous generation of kinetic energy, which drives the 
cantilever motion well past its natural oscillation, also shown in Supplementary 
Video 5. Furthermore, Supplementary Video 5 shows a single half-cycle of beam 
oscillation with high temporal resolution, and captures the marked dewetting 
behaviour of the droplet, underpinning the above claims. 

Droplet freezing. Owing to the large vaporization rates experienced by the drop- 
let, the relatively low thermal conductivity of water and the low liquid-solid wetting 
fraction ¢ of the surface texture, a large temperature difference can develop across 
the droplet. We estimate the magnitude of the temperature difference for a 
water droplet on a silicon micropillar surface using a one-dimensional model, 
which balances the rate of heat absorbed at the surface of the droplet owing to 
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evaporation, q ~ 4nRo JAH vay (here AHyap is the enthalpy of vaporization), 
with the rates of heat transferred to the interior of the droplet and to the substrate 
(heat losses to the environment are assumed to be negligible). We estimate 
the thermal resistance of the droplet, by treating it as a shell with an inner (Rj) 
and outer (R2) radius, as #y = (1 /R,;—1/R,) / (4nk,,), and that of the com- 
posite air-micropillar region as 97; = h / (k.mR°), where ky is the effective thermal 
conductivity of the water droplet and the effective thermal conductivity of the 
air-micropillar region is defined as k.= bks + (1 — $)kain with kg, and kair the 
thermal conductivities of silicon and air, respectively. Substituting appropriate 
values yields k. = 0.06 W cm! K~! (6 = 0.04, kair = 2.6 x 10-*Wecm7! K7}, 
kgi= 1.48 W cm! K~}) (ref. 40). We compare the magnitudes of these individual 
resistances by substituting appropriate values, which yields 9#,, = 133 K W7' 
and #,=3 K W7! (ky=6x10-? Wem! K74, h=1.3 x 10-2cm, Ro/R=2.1, 
Ro=R2=0.1cm and taking R2/R; =2) (ref. 40). Therefore, it is reasonable to expect 
that a much larger temperature difference will manifest itself across the droplet 
during vaporization than across the textured surface. With the resistance values, 
we estimate the temperature difference across the thin outer layer of the droplet to 
be AT © q,,. Substituting appropriate values yields AT ~ —28 K, so the estimated 
temperature difference is indeed substantial (J=0.69 x 10-3gcm~’ s"}, 
AH yap = 2,441] g~') (ref. 40). 
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Extended Data Figure 1 | Schematic idealizing the droplet trampolining 
phenomenon as a hybrid MSD-projectile system. MSD and projectile 
motion apply when y < 0 and y > 0, respectively. The variables are mass m, 


droplet ‘stiffness’ k, damping coefficient c, initial droplet velocity vo, droplet 
impact velocity v, and droplet recoil velocity v2; f(t) is the forcing function. 
The horizontal dashed line indicates where y is zero. 
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Extended Data Figure 2 | Comparing experimental and theoretical solutions in a as a function of time t. The magnitude of this force f was 
results for droplet trampolining. Quantities plotted are dimensionless. determined iteratively by matching the value of ¢ from the theory with that 
a, Plot of y as a function of t for experimental (blue circles) and theoretical from the corresponding experiments. The impact properties for the droplet 
(black line) cases. Inset, the droplet at the moment of impact (note that it is shown in a are Bo = mg/oRo = 0.42 and v; = —0.5Ro/T (first impact). The 
non-spherical). b, The applied force f required to generate the theoretical properties of the superhydrophobic surface were [d, I, h] =[1.4, 6.5, 4.8] um. 
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Extended Data Figure 3 | Determining the damping ratio ¢ for droplets 
impacting superhydrophobic surfaces under standard pressure 
conditions. a, b, Plots of —v,7/Ro versus € (a) and ¢ (b) as determined from 
experiments on superhydrophobic surfaces. Square symbols represent 
experiments performed in this work (advancing and receding contact 
angles 0*= 161° + 3°, 0*= 150° + 4°; [d, , h] =[1.5, 6.5, 13.3] um); errors 
represent the standard deviation of the measurement and triangles are 
experimental data from ref. 19 (0 ~ 170° + 3° and 9;—6;~ 5°). Ina, the 


-V, T/R, 


dashed green line represents the theoretical upper limit for ¢ for droplet 
impact (,/5 / 6); the solid black line is the average value of e from the 
experiments performed in this study. In b, the solid black and dashed green 
lines represent the average values of { obtained from experiments in this 
study and the theoretical lower limit of ¢, respectively. The theoretical lower 
limit is estimated with using e= ,/5 / 6 and t./T= 1.09 (ref. 19). Error bars 
in the plots represent measurement uncertainty. 
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Extended Data Figure 4 | The role of environmental pressure on the 
contact time of a droplet with a superhydrophobic substrate for a single 
impact cycle. Plot of droplet-substrate contact time t,/T versus —v,T/Ro 

for water droplets impacting a superhydrophobic surface with a wetting 
fraction of ¢ = 0.04 under low-pressure (circles) and standard-pressure 
(squares) conditions. The properties of the superhydrophobic surfaces were 
(d, 1, h] =[1.5, 6.5, 13.3] um (squares) and [d, J, h] =[1.4, 6.5, 4.8] um 
(circles). The horizontal dashed line denotes the so-called minimum 
contact time f,/T 1.09. Error bars in the plots represent measurement 
uncertainty. 
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Extended Data Figure 5 | Spreading behaviour of a water droplet. Plot 

of Rmax/Ro versus —v,T/Ro for droplets impacting onto a superhydrophobic 
surface in a low-pressure, low-humidity environment. The properties of the 
superhydrophobic surface were [d, |, h] =[1.4, 6.5, 4.8] um. Error bars in 
the plots represent measurement uncertainty. 
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Extended Data Figure 6 | The role of environmental pressure on the 
vaporization flux of a water droplet in a low-humidity environment. 


Plot of vaporization flux J versus environmental pressure P for a 


millimetre-scale water droplet in contact with a superhydrophobic surface. 


The properties of the surface used were [d, I, h] =[1.4, 6.5, 18.2] um. 


Error bars for P and J represent the uncertainty of the measurement and 
s.d., respectively. Each data point is the average of five measurements. 
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Extended Data Figure 7 | Exploiting trampolining dynamics with a 6 as a function of t for a similar sequence to that in a. See Supplementary 
cantilever. a, Overlaid image sequence (20 ms between the two images) Video 5 for further details. The properties of the surface used were 
of a droplet attached to a cantilever beam of length L exploiting droplet (d, 1, h] =[1.5, 6.5, 13.3] um. 


trampolining to create mechanical motion. b, Plot of beam deflection 
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Extended Data Figure 8 | Schematic showing the environmental 
chamber used throughout the study. We generated dry conditions in the 
chamber with nitrogen (N2), and the pressure was reduced with a vacuum 
pump. The front and back of the chamber were equipped with transparent 
windows that were removable to facilitate placement of substrates and 
droplets. The coordinates XC, YC and ZC are denoted by blue, red and 
green, respectively. 
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Extended Data Table 1 | Experimental details on the engineered surfaces used in this study 


Processing 


* * 
Type technique Specifications ) 0. (°) 0 (°) Micrograph 
PA : d,l,h|= 
Silicon Lithography and [4.1.4] 0.15 16241 14645 
micropillar etching 
[2.0,4.6,13.5]um 
F d,l,h\= 
Silicon Lithography and [4.1.4] 0.04 16143 15044 
micropillar etching 
[1.5,6.5,13.3] ym 
saaontek a ft and [d,1,h] = 0.05 16142 15443 
[1.6,6.5,10.9] ym 
et ee filha [4./,h] = 0.04 16142 «= 15442 
[1.4,6.5,4.8] um 
i ; d,l,h|= 
Silicon Lithography and [4.1.4] 0.05 16442 16145 
micropillar etching 
[1.6;6.5;3.5] um 
a ; d,l,h\= 
Silicon Lithography and [4.1.4] 0.04 16641 16241 . 
micropillar etching z 
[1.4;6.5;18.2] um ? 
i ba 
Etched Etching = = 15542 15243 
aluminum 
Polymer ‘ length of carbon 
nanocomposite Spray coating nanofiber: 20-200 um pales = 


For micropillar surfaces, the pillar diameter, pitch and height are given by [d, /, h], respectively. The liquid-solid area fraction is ¢. The apparent advancing and receding contact angles are 63 and 6}, 


respectively. All scale bars are 4m. 
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Extended Data Table 2 | Experimental probability of ice levitation as a function of droplet size on the CNF—-PMC coating under dry, 
low-pressure conditions for an environment at room temperature 


Ro rnin Ro max Number of trials [num] Probability 6h lea 
levitation [-] 

[mm] [mm] 

0.65 0.74 5 0.2 
0.88 1.18 5 1.0 
1.30 1.33 5 0.8 
1.47 1.51 5 1.0 
1.59 1.69 5 0.8 
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Rhodium-catalysed syn-carboamination of alkenes 
via a transient directing group 


Tiffany Piou' & Tomislav Rovis' 


Alkenes are the most ubiquitous prochiral functional groups— 
those that can be converted from achiral to chiral in a single 
step—that are accessible to synthetic chemists. For this reason, 
difunctionalization reactions of alkenes (whereby two functional 
groups are added to the same double bond) are particularly import- 
ant, as they can be used to produce highly complex molecular 
architectures’”. Stereoselective oxidation reactions, including 
dihydroxylation, aminohydroxylation and halogenation*, are well 
established methods for functionalizing alkenes. However, the 
intermolecular incorporation of both carbon- and nitrogen-based 
functionalities stereoselectively across an alkene has not been 
reported. Here we describe the rhodium-catalysed carboamination 
of alkenes at the same (syn) face of a double bond, initiated by a 
carbon-hydrogen activation event that uses enoxyphthalimides as 
the source of both the carbon and the nitrogen functionalities. The 
reaction methodology allows for the intermolecular, stereospecific 
formation of one carbon-carbon and one carbon-nitrogen bond 
across an alkene, which is, to our knowledge, unprecedented. The 
reaction design involves the in situ generation of a bidentate direct- 
ing group and the use of a new cyclopentadienyl ligand to control 
the reactivity of rhodium. The results provide a new way of synthes- 
izing functionalized alkenes, and should lead to the convergent and 
stereoselective assembly of amine-containing acyclic molecules. 

Functional groups that are based on nitrogen are prominent in 
biologically relevant molecules’, and stereoselective chemical methods 
for introducing nitrogen atoms into organic molecules are the subject 
of intense interest. Alkene hydroamination—the addition of a nitrogen 
and hydrogen across a carbon-carbon double bond—is an emerging 
technology for introducing nitrogen functionality (Fig. 1a)*7°. 
However, the incorporation of carbon-based coupling partners is 
more limited, despite the crucial role of reactions that form carbon- 
carbon bonds in chemical synthesis. Among these, Heck-type 
approaches are noteworthy for their ability to introduce a carbon 
fragment in a stereoselective manner under typically mild condi- 
tions'’’*. But both the hydroamination and the Heck-type reactions 
have the same strategic drawback: only one end of the alkene is 
functionalized. Simultaneous incorporation of both carbon- and 
nitrogen-based functionalities (carboamination) across an alkene 
would address this deficiency. 

Established stereoselective carboamination reactions are limited 
and fall into three categories (Fig. 1b). Of these, annulative reactions 
are popular and powerful but deliver a cyclic product, which limits 
their impact'?*. A handful of intramolecular approaches have also 
been developed, wherein one of the reacting partners is tethered to 
the alkene’*"””. Finally, there is a growing subset of radical-based reac- 
tions, which functionalize both ends of the alkene in a carboamination 
process'*”’. However, the involvement of radicals means that the ste- 
reochemistry present in the alkene starting material is typically lost. 
Here, we describe the stereoselective intermolecular carboamination of 
alkenes, using enoxyphthalimides as the source of both the carbon and 
the nitrogen atoms (Fig. 1c). In the presence of an Rh(11) catalyst, these 
precursors undergo stereospecific syn addition (addition to the same 


face of a double bond) to a variety of disubstituted alkenes, delivering 
acyclic products containing two contiguous stereocentres in an inter- 
molecular fashion. 

We have previously shown that enoxyphthalimides undergo 
Rh(1)-catalysed reactions with electron-deficient alkenes to deliver 
cyclopropane adducts (Fig. 2)”. The mechanism proposed involves 
the generation of intermediate A, the product of carborhodation of the 
alkene partner. We hypothesized that the Rh atom is coordinatively 
unsaturated and thus ligates the enol alkene fragment, which subse- 
quently undergoes migratory insertion to form the carbon-carbon 
bond in the cyclopropane product. Should the Rh atom instead be 
coordinatively saturated, intramolecular alkene coordination should 
be disfavoured, and reductive elimination to form the carboamination 
product might be favoured. Coordinative saturation of the Rh atom 
could conceivably occur by intramolecular coordination to a bidentate 
directing group. 

Our past efforts to install requisite bidentate directing groups”’ on 
the enoxyamine were frustrated by the instability of the product. We 
overcame this instability by generating a bidentate directing group 
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Figure 1 | Carboamination reactions. a, Transition-metal-catalysed 
difunctionalization of alkenes. Previously, such reactions could reliably achieve 
the introduction of either nitrogen-based or carbon-based functional groups 
(left-hand reaction); known reactions that introduce both groups across a 
single alkene (carboamination reactions, right, dotted arrow) have drawbacks. 
Mcar; metal-based catalyst; R, functional group. b, The previously known 
carboamination reactions in organic synthesis: annulation reactions, 
intramolecular reactions and radical reactions, all of which have limitations. 
c, Our proposed Rh(1)-catalysed intermolecular syn-carboamination of 
alkenes. Ar, aromatic groups; Ph, phenyl; Phth, phthalimide. 
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es of, single diastereoisomer, the relative configuration being unambiguously 
i Ns, Ap Ph Nn R assigned by X-ray crystallography and consistent with a syn-addition 
oS ea J 6 Ar i process, thereby confirming our initial hypothesis. Selectivity between 
Monodentate Ar 3aa and 4aa, however, remained less than optimal. 
directing group 7 A Building on our previous work on cyclopentadienyl ligands, we 
speculated that control of the chemoselectivity could be achieved 


“EH \* En through ligand design. Disappointingly, however, when using the 
Ar O tT Oy omy 6 L. monosubtituted cyclopentadienyl isopropyl (Cp””) ligand, which per- 
T ae a ar Ao | O-NCRh~_ | C-N one BAS 3 formed well in the cyclopropanation reaction, the carboamination 
Weer \ ania Ar i q product 3aa is not formed (Table 1, entry 4). Sterically hindered 

In situ generated B (1,3-di-tert-cyclopentadienyl, Cp‘) (ref. 22) or electron-deficient 


bidentate directing group re 


(trifluoromethy]-tetra-methyl-cyclopentadienyl, Cp”) (ref. 23) ligands 
Figure 2 | Working hypothesis: tuning of the directing group to influence furnish compound 3aa in poor yields (entries 5 and 6). But the pen- 


reactivity. The ligands on the Rh(im) catalyst are omitted for clarity. Ar, tasubstituted ligand cyclohexyl-tetra-methyl-cyclopendienyl (Cp*“”) 
aromatic group; L, exogenous nucleophile. A monodentate directing group gives the desired product 3aa with 69% yield and good chemoselec- 
allows for intramolecular coordination of the alkene to Rh (intermediate tivity (3aa/4aa = 8.0/1; Table 1, entry 7). Further increasing the 


A), leading to cyclopropanation. A bidentate directing group occupies the last 
coordination site on Rh (intermediate B), leading to reductive elimination to 
form a carboamination product. 


steric hindrance of the cyclopentadienyl ligand (tert-butyl-tetra- 
methyl-cyclopentadienyl, Cp*’") allows the formation of 3aa with 
an increased yield (72%) and slightly better chemoselectivity (3aa/ 
4aa = 8.4/1; entry 8). Finally, replacing the base caesium acetate with 
in situ using a more nucleophilic solvent such as methanol, which we caesium adamantylcarboxylate significantly improves the chemos- 
hypothesized would open the phthalimide to form the phthalimide- _ electivity (3aa/4aa = 14.8/1), producing the desired product 3aa with 
derived amido ester. Under these conditions, the formation of the an 82% yield (entry 9). Notably, decreasing the catalyst loading from 
carboamination product 3aa is favoured over the cyclopropane 4aa 10 mol% to 5 mol% and using an equimolar amount of base did not 
in a 2.8/1 ratio (Table 1, entry 2). We also observed the formation ofthe affect the efficiency of the reaction (entry 10). 

product 5aa, derived from the opening of the phthalimide ring. Having optimized the reaction conditions, we investigated the gen- 
Fortunately, the product 5aa could be converted back to 3aa without _ erality of the syn-carboamination (Fig. 3a). We first examined struc- 
erosion of diastereoselectivity, simply by heating the crude reaction _ tural variations in the N-enoxyphthalimide (substrate 1; Fig. 3b). The 
mixture at 60 °C in toluene after consumption of the starting material presence of a phenyl ring on substrate 1 proved essential. Electron- 
la (entry 3). Furthermore, we established that 3aa was formed asa donating and electron-withdrawing substituents located at the para, 


Table 1 | Optimization of reaction conditions 


Phr LO, 1! QO — GO2Me 
YT NPhth ©  CO,Me Me0,C ACO.Me | A 002Me | 
1a Method A, Bor Ct x conMe : pen i 
+ ——> Ph ; HN Oo 1 
| 
MeO,C cade NPhth COPh fede ! 
2a 3aa 4aa | \ 
5aa | 
| 
1 (SbF ¢)o | 
as dl ia Cp*:R=Me 
Me Me 
oe ere 1 cpy: R=C) 
a ; tBu S18 D>», Me a se ¥ 
fe CoN” N NCCH3 I Cp’: R = CF3 
: NCCH3 Cpt icpirt Me Cp*84: R = tBu 
[Cp*Rh(C H3CN)z](SbF 6) 
| 
Entry Methodt Cp* Solvent Ratio 3aa/4aat Yield 3aa (%)8 
1 A Cp* Trifluoroethanol 1/23 30%ll 
2 A Cp* ethanol 2.8/1 49%q 
3 B Cp* ethanol 3.5/1 60% 
4 B Cpi?r ethanol — 0% 
5 B Cpé ethanol — <10% 
6 B Cpos ethanol = <10% 
7 B Cp*°y ethanol 8.0/1 69% 
8 B Cp**8u ethanol 84/1 72% 
9 BH Cp*Bu ethanol 14.8/1 82% 
10 Cc Cpre4 ethanol 14.8/1 80%ll 


+Method A: 1a (1 equiv.), 2a (1.2 equiv.), [Rh(i!)] (10 mol%), caesium acetate (2 equiv.) in solvent (0.2 M), at room temperature for 16 hours. Method B: 1a (1 equiv.), 2a (1.2 equiv.), [Rh(i)] (10 mol%), caesium 
acetate (CsOAc; 2 equiv.) in solvent (0.2 M), at room temperature for 16 hours then stirred in toluene (0.2 M) at 60 °C for 4 hours. Method C: 1a (1 equiv.), 2a (1.2 equiv.), [Rh(1!)] (5 mol%), 1-adamantyl carboxylate 
caesium (1-AdCO2Cs; 1 equiv.) in methanol (0.2 M), at room temperature for 16 hours then stirred in toluene (0.2 M) at 60 °C for 4 hours. The reaction is shown at the top of the inset figure. The bottom-left part of 
the figure shows the prototypical structure of the Rh catalysts used here; the bottom-right part of the figure shows the defined structures of the cyclopentadienyl ligands. 

{Determined by analysis of the unpurified mixture by 1H nuclear magnetic resonance (NMR) spectroscopy. 

§NMR yield. 

lllsolated yield. 

{Ratio 3aa/5aa/4aa = 2.8/1/1. 

#1-AdCO2Cs was used as the base instead of CsOAc. 

Cp, cyclopentadienyl; Cp*, penta-methy|-cyclopentadienyl; Cp", trifluoromethy|-tetra-methyl-cyclopentadienyl; Cp*°”, cyclohexyl-tetra-methyl-cyclopentadienyl; Cp’ 
tert-butyl-cyclopentadienyl; Cp*“®", tert-butyl-tetra-methyl-cyclopentadienyl; Cy, cyclohexyl; iPr, isopropyl; Me, methyl; Phth, phthalimide; Ph, phenyl; tBu, tert-butyl. 


iPr 


, isopropyl cyclopentadienyl; Cp’, 1,3-di- 


5 NOVEMBER 2015 | VOL 527 | NATURE | 87 
©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Figure 3 | Applications of the carboamination 


oO R i oe 
és a l () (Cp*®4Rh(CHgCN) (SbF 6)> reaction. a, General conditions for 
~ (5 mol%) NN wR carboamination of 1,2-disubstituted alkenes. 
NN . . 
NPhth , R <a ; > | b, Effect of substituents in the 
1-AdCO2Cs (1 equiv.), MeOH, AT A fA NPhth N-enoxyphthalimide. c, Probe of reaction 
1 2 (i Toluene, 60 °C R Yield (%) (ratio 3/4) stereospecificity. d, Functionalization of 1,2- 
disubstituted alkenes. e, Functionalization of 
b monosubstituted alkene. f, Derivatization of the 
oe faa O  COMe 3fa, R = Me carboamination adduct: formation of pyrrolidine. 
fe) CO2Me = 70% (14.9/1) Ac, acetyl; Ad, adamantyl; Bn, benzyl; Boc, tert- 
Sea, Bia F ,COzMe 3ga, R = OMe 
\COMe 70% (14.9/1) = 76% (9.8/1) butoxycarbonyl; Cy, cyclohexyl; Cp, 
~ 3da, R = M i - Cp* . a 
a ee NPhth Sha, R =F cyclopentadieny!; Cp*, penta-methyl 
NPhth 74% (10.4/1) cyclopentadienyl; Et, ethyl; EtOH, ethanol; iPr, 
R 3ea, R=Ph . 
47% (10.2/1) R isopropyl; Me, methyl; MeOH, methanol; Ph, 
phenyl; Phth, phthalimide; r. r., regioisomeric ratio; 
RT, room temperature; tBu, tert-butyl; TBS, tert- 
CO.Me CO.Me CO.Me butylsilyl. Yields are given as percentages; ratios of 
2 a 
roduct 3/product 4 are also given. 
_sCOzMe —— _sCOoMe P P 8 
NPhth oor NPhth NPhth 
3ia 3ka 
69% (13.0/1) 53% a 7/1) 30% (6.7/1) 
c MeO2C 
2) ee CO.Me CO2Me 
> 
gaa NPhth 
Ph fo) ()) [Cp*B4Rh(CH3CN)3](SbF 4)» 80% (14.8/1) 
T ~NPhth (5 mol%) dir. >20/1 
14-AdCO2Cs (1 equiv.), MeOH, RT 
ta (ii) toluene, 60 °C CO.Me 
CO2Me 
> 
— gab NPhth 
MeO2C CO2Me 73% (>20/1) 
2b dvr. = 18/1 
d 2 oR 
ae 
% ca) OtBu N(iPr)o 
rita al ae O COR 
COR \COvEt _sCOvEt 
Bi whee Ph ee Ph ais : 
NPhth NPhth a NPhth as NPhth 
al al 
3ac, R = OTBS, 86% (>20/1) Saf, R = tBu, 72% (>20/1) 86% (>20/1) 60% (3.6/1) 
3ad, R = Cl, 85% (>20/1) 3ag, R = Bn, 80% (>20/1) 8.9/1 rr. S20/1 rr. 
Bae, R = CF3, 89% (>20/1) 
fe) ph 
@) CF3 fe) N (e) Oo 
COzEt NBoc 
Ph : Ph 2 >A? oo 
NPhth NPhih NPhth NPhth 
3aj a 3al 3am 
53% (>20/1) 74% (>20/1) 69% (>20/1) 25% (>20/1) 
40/1 rr. >20/1 rr. >20/1 rr. 
e fe) 
Ph oO, +O 9 
NPhth [Cp*°YRMClz}, (5 mol%) sCOEt 
Tt ZA ~cOrEt : Ph : 
1-AdCO2Cs (2 equiv.) 
MeOH, 60 °C NPhth 
la 2n 3an, 53% 
NH COztBu CO2tBu 
Oo CO.tBu HoN7 ~~ 2 
,COztBu (3 equiv.) Pt/C, Hp (30 psi) 
ie EtOH, 80 °C Ph S77 "/COztBu EtOH, RT PY Ny "'COetBu 
NPhth H 
3ae 6 7 


85%, 10/1 d.r. 


meta, or ortho positions of the phenyl ring are all well tolerated, 
providing the corresponding carboamination adducts 3ba—3ka in 
good to high yields (30-76%). A few products were formed with low 
yields, which we think correlates with the relative insolubility of their 
derived starting materials (N-enoxyphthalimides le, 1j and 1k). 
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65%, >20/1 d.r. 


In order to probe the stereochemical outcome of the reaction, we 
subjected fumarate and maleate esters (2a and 2b, Fig. 3c) to the 
optimized reaction conditions. The reaction delivered isomeric pro- 
ducts 3aa and 3ab in high diastereoselectivity, suggesting that the 
insertion event is a stereospecific syn addition across the alkene. We 
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Figure 4 | Study and proposed reaction 


a O CO,Me : . . 
mechanism. a, Crossover experiment using an 
Me NPhth ( [Cp“®“Rh{CHgCN)5)(SbF«)3 \CO2Me equimolar mixture of N-enoxyphthalimides 
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next tested a variety of alkenes in our carboamination reaction 
(Fig. 3d); we found that the reaction conditions are mild enough to 
tolerate sensitive functional groups such as silyl ethers, chloro-alkyls 
and fluoro-alkyls. The corresponding adducts 3ac-3ae were isolated in 
high yields (85-89%) with excellent chemoselectivity. The reaction 
also proceeds with hindered alkenes, leading to 3af and 3ag in 
excellent yields. Interestingly, in the case of unsymmetrical trans- 
1,2-disubstituted alkenes, the carboamination reaction takes place 
with a high control of regioselectivity, leading to products 3ah-3aj 
as the major regioisomers (53-86% yield). In all cases, the most 
bulky substituent is placed away from the phthalimide group. Also, 
N-phenylmaleimide 2k is a suitable substrate, giving the desired 
product 3ak with a 74% yield. Electron-rich alkenes such as 1,2- 
dihydrofuran (21) and 1,2-dihydropyrrole (2m) are also reactive, and 
produce disubstituted tetrahydrofuran (3al) and pyrrolidine (3am) 
with a 69% and a 25% yield, respectively. Gratifyingly, both heterocycles 
were obtained as single regioisomers and diastereoisomers, as found 
previously”. Finally, by switching to [Cp*°’Rh(CH3CN)3](SbF.)> as 


the catalyst, the scope of the carboamination reaction was expanded to 
include monosubstituted alkenes. Thus, when using ethyl acrylate (2n) 
as a coupling partner, the unnatural «-aminoacid derivative 3an is 
isolated with a 53% yield (Fig. 3e). 

The carboamination products, 3, are versatile entities. In addition to 
showing similarity to unnatural o-amino acids, they may also be con- 
verted into pyrrolidines (7; Fig. 3f). Deprotection of the phthalimide 
group followed by cyclization affords the 1,2-dihydropyrrole 6 
(diastereomeric ratio, d.r. = 10/1, 85% yield), which can be reduced 
under heterogeneous conditions to yield pyrrolidine 7 in high diaster- 
eoselectivity (d.r. > 20/1). 

In order to investigate the mechanism underlying the carboamina- 
tion reaction, we probed whether delivery of the phthalimide moiety 
occurs through an intramolecular or intermolecular process. We car- 
ried out a crossover experiment by submitting an equimolar mixture of 
N-enoxyphthalimides 1f and 11 to our optimized reaction conditions 
(Fig. 4a). No crossover adduct 8 is formed, suggesting that, in agree- 
ment with our initial proposal (Fig. 2b), delivery of the phthalimide 
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moiety takes place intramolecularly. Moreover, when product 3aa was 
subjected back to the reaction, product 5aa did not form (Fig. 4b), 
suggesting that the phthalimide group opens before the final product 
3aa is formed. Thus, the adduct 5aa might be formed first, and cycliz- 
ing back during the reaction to give 3aa. We confirmed this assump- 
tion by monitoring the reaction progress using nuclear magnetic 
resonance (see Supplementary Information). To elaborate further on 
this idea, we investigated the reactivity of a bidentate substrate. 
Attempts to open the phthalimide group with methanol were unsuc- 
cessful owing to the instability of the product. However, the parent 
substrate 1m proved more stable, and was subjected to the carboami- 
nation reaction (Fig. 4c). The expected product 3ma was indeed 
formed, albeit with a moderate yield (35%). A control experiment 
demonstrates that the carboamination product 3ma does not open 
in the presence of exogenous pyrrolidine under our standard reaction 
conditions (see Supplementary Information). Taken together, these 
results support our hypothesis that the directing group might be biden- 
tate and emerge from in situ opening of the phthalimide moiety. 

On the basis of these experiments, we propose the following cata- 
lytic cycle (Fig. 4d). First, in the presence of methanol and a base, the 
N-enoxyphthalimide 1a can reversibly open to form intermediate II 
(Fig. 4d, route a). The active Rh(im) catalyst then undergoes an irre- 
versible carbon—hydrogen activation at the alkene position, leading to 
the five-membered rhodacycle III. Alternatively, we cannot rule out 
the possibility that the carbon—hydrogen activation event precedes the 
opening of the phthalimide group (IV to II, Fig. 4d, route b). In either 
case, migratory insertion of alkene, 2, then generates the coordinatively 
saturated Rh(m1) complex V, with coordination of the ester group to 
the metal. We postulate that the bidentate directing group formed 
in situ stabilizes intermediate V, inhibiting both competitive migratory 
insertion into the enol alkene, and the B-H-elimination that forms the 
corresponding diene by a Heck-type process”. Instead, intermediate 
V undergoes reductive elimination to form intermediate VI. An oxid- 
ative addition of the nitrogen-oxygen bond into Rh(1) followed by 
protonation/tautomerization of the enol liberates the opened product 
5a, with concomitant regeneration of the active Rh(u1) catalyst. Finally, 
during the reaction, the phthalimide group is re-formed to afford 
product 3a. The origin of the chemoselectivity might be produced 
by the solvent effect (Table 1, entry 1 versus entry 2). When using 
methanol as the solvent, the initial opening of the phthalimide moiety 
prevails, favouring the formation of intermediate III and therefore the 
carboamination pathway. Conversely, the less-nucleophilic solvent 
trifluoroethanol tends to preserve the integrity of the phthalimide, 
and thus the cyclopropanation pathway is preferred. 

We have developed a reaction that achieves syn-carboamination 
of disubstituted alkenes. The reaction uses enoxyphthalimides and 
a Rh(u) catalyst. Ligand development has revealed a new, bulky 
cyclopentadienyl group that alters the inherent chemoselectivity of a 
reaction. The use of methanol as a solvent is crucial, as is the obser- 
vation that the phthalimide group undergoes in situ ring opening. 
Mechanistic experiments suggest that the basicity of the pendant car- 
bony] stabilizes a Rh(11) intermediate by coordinative saturation, lead- 
ing to reductive elimination rather than to cyclopropanation. We are 
now investigating ways to broaden this reaction and to develop an 
asymmetric version of the transformation. 
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The genetic sex-determination system predicts 
adult sex ratios in tetrapods 


Ivett Pipoly', Veronika Bokony"”, Mark Kirkpatrick®, Paul F. Donald*’, Tamas Székely®°* & Andras Liker!* 


The adult sex ratio (ASR) has critical effects on behaviour, ecology 
and population dynamics’”, but the causes of variation in ASRs are 
unclear**. Here we assess whether the type of genetic sex deter- 
mination influences the ASR using data from 344 species in 117 
families of tetrapods. We show that taxa with female heterogamety 
have a significantly more male-biased ASR (proportion of males: 
0.55 + 0.01 (mean + s.e.m.)) than taxa with male heterogamety 
(0.43 + 0.01). The genetic sex-determination system explains 
24% of interspecific variation in ASRs in amphibians and 36% in 
reptiles. We consider several genetic factors that could contribute 
to this pattern, including meiotic drive and sex-linked deleterious 
mutations, but further work is needed to quantify their effects. 
Regardless of the mechanism, the effects of the genetic sex-deter- 
mination system on the adult sex ratio are likely to have profound 
effects on the demography and social behaviour of tetrapods. 

The adult sex ratio (ASR) varies widely in nature, ranging from 
populations that are heavily male-biased to those composed only of 
adult females*®. Birds and schistosome parasites tend to have male- 
biased ASRs, for example, whereas mammals and copepods usually 
exhibit female-biased ASRs*. Extreme bias occurs among marsupials 
(Didelphidae and Dasyuridae): males die after the mating season, so 
there are times when the entire population consists of pregnant 
females’. Understanding the causes and consequences of ASR vari- 
ation is an important goal in evolutionary biology, population demo- 
graphy and biodiversity conservation because the ASR affects 
behaviour, breeding systems and ultimately population fitness’?*°. 
It is also an important issue in social sciences, human health and 
economics, since unbalanced ASRs have been linked to violence, rape, 
mate choice decisions and the spread of diseases such as HIV'”"”. The 
causes of ASR variation in wild populations, however, remain 
obscure**", 

One factor that could affect the ASR is the genetic sex-determina- 
tion system”®"*. Taxa such as mammals and fruitflies (Drosophila) 
have XY sex determination (males are heterogametic), whereas taxa 
such as birds and butterflies have ZW sex determination (females are 
heterogametic). Sex-determination systems could affect the ASR in 
several ways. A skewed ASR might result from an unbalanced sex ratio 
at birth caused by sex ratio distorters'’. Alternatively, a biased ASR 
could develop after birth if sex chromosomes contribute to sex differ- 
ences in mortality*’*’*. Differential postnatal mortality is likely to be 
the main driver of biased ASRs in birds and mammals, since birth sex 
ratios in these classes tend to be balanced’. 

Here we use data from the four major clades of tetrapods (amphi- 
bians, reptiles, birds and mammals) to assess whether ASRs, measured 
by convention as the proportion of males in the population, 
differ between taxa with XY and ZW sex determination (Fig. 1 and 
Supplementary Data). While mammals and birds are fixed for XY and 
ZW sex determination, respectively, reptiles and amphibians 
provide particularly attractive opportunities for this study, since 


transitions between sex-determination systems have occurred many 
times within these clades'’. We compiled published data on adult sex 
ratios in wild populations and their sex-determination systems 
(Supplementary Data). To control for phylogenetic effects, we used 
phylogenetic generalized least squares (PGLS)'* models to test for 
differences in ASRs between XY and ZW taxa, and Pagel’s discrete 
method (PDM)” to test whether XY and ZW systems are evolutiona- 
rily associated with female-biased and male-biased sex ratios, respect- 
ively. Phylogenies were taken from recent molecular studies 
(see Methods for details). 

Both the ASR and the sex-determination system are highly variable 
across tetrapods (Fig. 1 and Supplementary Data). We find that the 
ASR and sex determination are correlated. Before controlling for 
phylogenetic effects, we find that ASRs are significantly more male- 
biased in species with ZW sex determination than in those with XY sex 
determination (Fig. 2, Table 1 and Extended Data Table 1). Similarly, 
the proportion of species with male-biased ASRs is greater among ZW 
than XY species (Fig. 1 and Table 1). These differences are significant 
within amphibians, within reptiles, and across tetrapods as a whole 
(Table 1 and Extended Data Table 1). 

The pattern remains significant after controlling for phylogenetic 
effects. Both the mean of ASR across species (analysed using PGLS) 
and the proportion of species with male-biased sex ratios (analysed 
using PDM) differ significantly between XY and ZW systems within 
amphibians, within reptiles, and across tetrapods as a whole (Table 1 
and Extended Data Table 1).The effect is strong in clades with vari- 
ation in sex determination: the type of genetic sex determination 
explains up to 24% of the interspecific variance in the ASR among 
amphibians and 36% in reptiles (estimated using PGLS; Extended Data 
Table 2). The results remain significant when we treat three large 
clades with invariant sex-determination systems as a single datum each 
(snakes, ZW; birds, ZW; mammals, XY; Extended Data Table 1), when 
we make different assumptions about branch lengths in the phylogeny 
(Extended Data Table 2), and when we use arc-sine-transformed ASR 
values and control for variance in sample size (see Methods). 

Body size and breeding latitude correlate with life-history traits in 
many organisms and these traits could affect ASR*°. Sexual size 
dimorphism is linked to differential sexual selection acting on males 
and females and thus influences sex-specific mortality, and has been 
suggested to drive the evolution of genetic sex-determination sys- 
tems*'. Nevertheless, we find that neither body size nor breeding lat- 
itude explains significant variation in the ASR in phylogenetically 
controlled multi-predictor analyses (Table 2). Sexual size dimorphism 
is significantly associated with ASR in reptiles and across tetrapods as a 
whole, but the effect of the genetic sex-determination system remains 
significant when size dimorphism is included in the analysis (Table 2). 

Sex differences in dispersal may also result in biased ASRs. However, 
dispersal is unlikely to explain the relationship between ASR and sex- 
determination systems. First, male-biased dispersal is typical in reptiles 
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Figure 1 | Phylogenetic distribution of the ASR and genetic sex- 
determination systems across tetrapods. Inner band shows the type of sex 
determination (red: XY, blue: ZW), and the outer band shows the ASR bias for 


regardless of the sex-determination system’? (Supplementary 
Information 1). Second, there is no relationship between the ASR 
and sex bias in dispersal distance in birds (Supplementary 
Information 1). Finally, the relationship between sex determination 
and the ASR remains significant when the influence of sex-biased 
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Figure 2 | Variation in the ASR as a function of the sex-determination 
system in amphibians, reptiles, mammals and birds, and across tetrapods 
(all four clades combined). Adult sex ratio is the proportion of males in all 
adults. Central dots and solid whiskers are mean = s.e.m., horizontal bars are 
medians, and boxes and dashed whiskers show the interquartile ranges and data 
ranges, respectively, based on species values. Numbers of species are at the 
bottom of each panel. See Table 1 and Extended Data Table 1 for statistical 
results, and Extended Data Fig. 1 for phylogenetically corrected graphs. 


92 | NATURE | VOL 527 | 5 NOVEMBER 2015 


Outer band: 
ASR bias 


seyewld 


each species included in the study (red: = 0.5, blue: > 0.5 proportion of males). 
Sample sizes: 39 species for amphibians, 67 species for reptiles, 187 species for 
birds and 51 species for mammals (see Supplementary Data). 


dispersal is controlled in multi-predictor models in tetrapods 
(Supplementary Information 1). 

The sex-determination system may affect the ASR in the directions 
seen in the data in several ways. First, sexual selection can fix mutations 
that increase male mating success and decrease male survival. These 
will accumulate on Y but not on W chromosomes, and will accumulate 
more readily on X than on Z chromosomes if they tend to be recessive. 
Second, biased ASRs could result from recessive mutations at loci 
carried on the X (or Z) chromosome but absent from the Y (or W) 
chromosome since they are not masked in the heterogametic sex (the 
‘unguarded sex chromosome’ hypothesis)**"*, and from deleterious 
mutations carried on the Y (or W) but not on the X (or Z) chro- 
mosome. At loci carried on both sex chromosomes, alleles on the Y 
(or W) can show partial degeneration”’. Population genetic models 
suggest that deleterious mutation pressure alone may not be adequate 
to explain ASR biases as large as those observed (Supplementary 
Information 2), but the models do not include factors that could be 
important, notably the degeneration of Y and W chromosomes by 
genetic drift**. A third hypothesis is imperfect dosage compensation, 
which may be deleterious to the heterogametic sex™*. Fourth, distorted 
sex ratios can result from meiotic drive acting on sex chromosomes”. 
Drive more often produces female-biased sex ratios in XY systems at 
birth”®. There is little data on drive in ZW systems, but if it operates in a 
symmetrical fashion then we expect it to cause male-biased sex ratios. 
Fifth, the Y and W chromosomes might degenerate during the life- 
span, for example by telomere shortening or loss of epigenetic marks, 
more rapidly than the X and Z chromosomes. A final possibility is that 
sex-antagonistic selection acting on sex-linked loci could lead to biased 
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Mean ASR Species with male-biased ASR (%) 
Taxon Number of species XY ZW t-test} PGLS+ XY ZW PDM+ 
Amphibians 39 0.51 0.61 ae bai 42.9 90.9 - 
Reptiles 67 0.45 0.57 ail ital 24.2 73.5 x 
Birds 187 - 0.55 - = = 765 = 
Mammals 51 0.37 - - - 9.8 me al 
Tetrapods 344 0.43 0.55 ital Te 22.3 772 ere 


Mean ASR (proportion of males in the population), t-tests and the percentage of species with male-biased ASRs represent species-level statistics and analyses, while PGLS'® and PDM’® were used for 


phylogenetically corrected analyses of the difference in ASR between XY and ZW species. 
*P<0.05; **P<0.01; ***P<0,.001; ‘—’ denotes no data or not tested. 
+ Detailed results of the statistical analyses are presented in Extended Data Table 1. 


Table 2 | Phylogenetically corrected multi-predictor analyses of ASR variation 


Amphibians (n = 39) 


Reptiles (n = 67) Tetrapods (n = 259) 


b (+s.e.m.) t P b (+s.e.m.) t P b (+s.e.m.) t P 
Sex-determination system 0.10 (+0.03) 3.38 0.002 0.10 (+0.02) 4.56 <0.001 0.10 (+0.02) 5.23 <0.001 
Body size 0 (+0) 1.41 0.166 0 (+0) 0.78 0.440 0 (+0) 0.05 0.962 
Breeding latitude 0 (+0) 0.13 0.898 0 (+0) 0.04 0.966 0 (+0) 0.24 0.811 
Sexual size dimorphism —0.32 (£0.34) 0.92 0.363 —0.31 (£0.15) 2.17 0.034 —0.38 (£0.07) 5.57 <0.001 


Relationships between the ASR, sex-determination system and other factors in phylogenetically corrected multi-predictor analyses using PGLS models?®. Separate models of ASR were constructed for 


amphibians, reptiles and all tetrapods combined. For sex determination, b is the estimated difference in ASR between ZW an 


sex ratios, but unlike the preceding hypotheses there does not seem to 
be a robust prediction about the direction of the ASR bias it will 
produce (Supplementary Information 2). 

The limited data available do not provide clear support for any of 
these hypotheses, although critical tests are lacking. For instance, the 
meiotic drive process predicts biased sex ratios at birth. Although a 
recent comparative analysis in birds suggests that sex ratios at birth are 
unrelated to biased ASRs’, offspring sex ratios have not been compared 
between different sex-determination systems. Further insight might 
come from the study of dioecious plants with biased sex ratios’, but 
their skewed ASRs could result from selection on the gametophytic 
stage that is absent from animals”. Evolutionary feedbacks from the 
ASR to the sex-determination system are also possible: for example, 
the ASR could influence sexual size dimorphism and sexual conflict, 
which in turn could trigger transitions in sex determination’’””. 

In conclusion, we demonstrate strong and phylogenetically robust 
associations between genetic sex-determination systems and a demo- 
graphic property of populations, the ASR. Although the mechanisms 
that drive this association need further theoretical and empirical ana- 
lyses, the observed pattern is biologically important for two reasons. 
First, changes in sex-determination systems are expected to have 
knock-on effects on social behaviour. Theory suggests that the ASR 
affects violence, pair bonds, infidelity and parental care’, and 
field-based studies support these predictions®’®’*. For instance, 
female-biased ASRs co-occur with polygyny and female care, whereas 
male-biased ASRs tend to co-occur with polyandry and male care in 
birds’. Second, sex-determination systems may have important demo- 
graphic consequences through skewed birth sex ratios and sex-biased 
survival. Such biases may not only affect the productivity and growth 
of populations, but also their genetic composition and viability. 
Further theoretical, experimental and comparative studies are clearly 
needed to understand the linkages between sex determination, demo- 
graphy and social behaviour. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Data collection. We collected data on the ASR (expressed by convention as the 
proportion of males in the adult population) in amphibians and reptiles from 
literature published by December 2013, by searching in Google Scholar and 
Web of Science with the key words ‘sex ratio’ and ‘reptile’ or ‘amphibian’ or the 
scientific names of species. We also used reviews to identify additional data 
sources***. ASR data for mammals’ were obtained from a similar search 
finished in 2007; and we used avian ASR estimates from our existing data set 
(supplementary information of ref. 10). We aimed to collect all ASR data that 
were available for amphibian and reptile species with known sex determination, so 
no statistical methods were used to pre-determine sample sizes. During the col- 
lection of ASR data for amphibians and reptiles, investigators were blinded to the 
type of sex determination. ASR data for birds and mammals were collected before 
the initiation of the current study, and for different purposes. 

We specifically collected ASR data for amphibians and reptiles from studies that 
aimed to obtain representative estimates for the population composition and thus 
provide reliable sex ratio data**. These include either long-term demographic 
studies applying mark-recapture or culling methods (that is, each individual 
was counted only once) with similar capture probabilities for the sexes, or total 
population counts. When more than one measure was available, we used the total 
counts of individually marked animals over the study period because this may best 
approximate the overall ASR. We excluded studies in which the authors explicitly 
stated or speculated that their data may not represent the population-level ASR, or 
when the methods were not described in enough detail to assess the reliability of 
the ASR estimate. Moreover, we tested whether ASR estimates differed between 
sampling (hand-capture, trap, other) and marking (mark-recapture, culling) 
methods, and we found no such differences (linear mixed-effects model with species 
as random factor, sampling: Fis, 195) = 0.50, P = 0.683; marking: Fo, i195) = 2.18, 
P= 0.118; n = 234 records). When more than one estimate of the ASR was available 
for the same population (for example, from several yearly counts at the same loca- 
tion) we took their mean weighted by sample size. When more than one independent 
record was available for a species from different populations or studies, we used their 
simple mean. Weighted and non-weighted mean ASRs were highly correlated 
(amphibians: Pearson’s r= 0.973, P< 0.001, n= 35 species; reptiles: r= 0.995, 
P<0.001, n = 60 species); we used non-weighted averages because not all studies 
reported sample size. 

We categorized the genetic sex-determination (GSD) systems of the species 
from published sources either as male-heterogametic (XY) or female-heteroga- 
metic (ZW). For amphibians, only species with known GSD systems were 
included*'*4, because GSD is an evolutionarily labile trait in amphibians; species 
within a genus or even populations within a species can differ in GSD system””. For 
reptiles, we included species for which the GSD was known either at the family 
level or at the species level if both XY and ZW systems were found in the 
family**°°*’. Our result for reptiles is not changed qualitatively by restricting 
our analyses to those species for which the GSD is known at species level**, that 
is, when species for which we assumed the GSD based on other species in the 
family were excluded (difference between XY and ZW reptile species, PGLS 
model'*"*; b + s.e.m. = 0.11 + 0.02; t = 4.70, P< 0.001, n = 26; R* = 0.479). All 
birds were assigned to ZW, and all mammals to XY sex-determination systems”. 

We also collected data on three additional ecological and behavioural variables 
to control for their known correlation with the ASR and so reduce potential 
confounding effects in multi-predictor analyses. First, we used body size, which 
was measured as snout-to-vent length (in mm) for amphibians and squamates, 
and carapace length for the two turtle species, where possible from the same 
population for which ASR was reported. Head-body length was used for mammals 
(n = 36) (Encyclopedia of Life, http://www.eol.org). Since head-body length is not 
available for the vast majority of birds, we calculated this from the total body length 
by subtracting bill and tail length (n = 133; Supplementary Data). Where we had 
sex-specific data, the mean of male and female head-body length was used as body 
size variable in the analyses. 

Second, we estimated sexual size dimorphism as log;9(male body size) — logio 
(female body size). For birds, we used body mass dimorphism (data available for 
n= 181 species)” owing to the lack of sex-specific body length data. The results of 
the multivariate PGLS model of tetrapods presented in Table 2 remain qualita- 
tively the same when wing length dimorphism (data available for n = 153 species) 
is used for birds instead of body mass dimorphism (effect of sex determination: 
b+sem. = —0.10+ 0.02, t=4.97, P<0.001; body size: b+s.em.=0+0, 
t= 0.06, P=0.949; latitude: b+ s.e.m. 0+0, t=0.223, P=0.823; size 
dimorphism: b + s.e.m. = —0.52 + 0.12, t= 4.33, P<0.001; n = 248 species). 

Third, we included breeding latitude***' as the geographic coordinates of the 
ASR studies for amphibians and reptiles, taking absolute values to represent dis- 
tance from the Equator in latitudinal degree. When the authors did not report 
latitude, we used Google Earth to estimate it on the basis of the description of the 
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study site. For birds and mammals, we used the latitudinal midpoint of the breed- 
ing range of the species (m = 182 and 44 species, for birds and mammals, respect- 
ively; sources: V. Remes, A. Liker, R. Freckleton & T. Székely unpublished data for 
birds, and the PanTHERIA database for mammals”, respectively). Mean values of 
these variables were used if multiple data of body size, latitude or size dimorphism 
per species were available. 

Other possible confounding factors include the lifespan of individuals and sex- 
specific dispersal distances. First, longer average lifespan may lead to exaggeration 
of ASR bias. However, in species with available data”, lifespan is unrelated to the 
ASR (PGLS, birds: b + s.e.m. = 0 + 0, t = 0.196, P = 0.845, n = 71 species; mam- 
mals: b+sem. =0+0, t=0.751, P=0.457, n= 35 species) and also to the 
absolute deviation of the ASR from 0.5 (that is, when assuming that longer lifespan 
can exaggerate ASR bias in either direction; birds: b + s.em. = 0 +0, f= 1.543, 
P=0.127, n=71 species; mammals: b +s.em. =0 +0, t= 0.180, P= 0.858, 
n = 35 species). Second, sex-specific dispersal can bias the ASR owing to the higher 
mortality in the sex with longer dispersal distances. However, we found no evidence 
of a relationship of sex bias in dispersal either with the GSD in reptiles or with the 
ASR in birds (Supplementary Information 1). For these reasons, and because data 
on lifespan and/or sex-specific dispersal are not available for most species in our 
ASR data set, we did not include these variables in the main multi-predictor models. 

Our final data set comprises data on 39 amphibian species and 67 reptile species 
(in total, n = 229 ASR records from different populations), 187 bird species and 51 
mammalian species (a total of 344 species). We could not find body size and 
latitude data for some species, thus sample sizes were reduced in multi-predictor 
models. All species-level data and their sources are given in Supplementary Data. 
Data analysis. To assess the reliability of the amphibian and reptile ASR estimates, 
we calculated the repeatability of ASR as the intraclass correlation coefficient 
(ICC) following ref. 44, using only those species for which we had at least two 
ASR estimates from different populations. These analyses show a moderate repeat- 
ability of ASR, and that a significant part of ASR variation is interspecific (amphi- 
bians: ICC = 0.559, F(22,96) = 7.27, P< 0.001, n = 23 species, n = 120 records; 
reptiles: ICC = 0.524, F326) = 4.11, P= 0.001, n = 14 species, n = 40 records). 
For birds, our earlier analyses showed that 44% of the ASR variation was inter- 
specific, and that the direction of ASR (that is, male- or female-biased) was highly 
conserved: in 44 out 55 species (80%), the direction of the ASR bias was the same 
for all repeated estimates’. For mammals, we did not find enough multiple ASR 
data within species to estimate repeatability. 

In the comparative analyses we used the topology of ref. 45 for amphibians, a 
composite phylogeny for reptiles**, ref. 49 for birds’®, the family-level relation- 
ships of ref. 50 and the genus/species level relationships of ref. 51 for mammals. 
For analyses across tetrapods, the branching topology between these four major 
clades was based on recent tetrapod phylogenies**”? (Fig. 1). Because we did not 
have branch length information for these composite phylogenies, we ran the 
analyses using arbitrary gradual branch lengths according to Nee’s method™. 
However, our results remained consistent when we repeated the analyses with 
other branch length assumptions (Pagel’s method and unit branch lengths”; 
Extended Data Table 2). 

To test the association between ASR bias (male- versus female-biased) and GSD 
(XY versus ZW) in phylogenetically corrected analyses, we used PDM” as imple- 
mented in BayesTrait*’. We used maximum likelihood methods to fit independent 
and dependent models for transitions in ASR bias and GSD states, and compared 
the fit of these two models by a likelihood ratio test'’. To test the ASR difference 
between XY and ZW species, we used PGLS models with maximum likelihood 
estimates of Pagel’s 2 values'* using the R°° package ‘caper’**’. ASR was the 
response variable in all models, and the genetic sex-determination system was 
fitted as the predictor (Table 1 and Extended Data Table 1). The parameter 
estimate b shows the difference in ASR (proportion of males in the population) 
between ZW and XY species. To test the robustness of the bivariate results, we 
added body size, breeding latitude and sexual size dimorphism as predictors in 
multi-predictor models to control for their potential confounding effects (Table 2). 
As in earlier ASR studies*”, the distribution of ASR values did not deviate signifi- 
cantly from normal in the four clades separately as well as in tetrapods as a whole; 
our results remain qualitatively identical when ASR is arc-sine-transformed before 
PGLS analyses (amphibians: b + s.e.m. = 0.10 + 0.03, #37 = 3.44, P= 0.001, 
n = 39; reptiles: b + s.e.m. = 0.12 + 0.02, tes = 5.95, P< 0.001, n = 67; tetrapods: 
b+s.e.m. = 0.11 + 0.02, tay = 5.24, P<0.001, n = 344). 

The difference between XY and ZW systems for tetrapods is not sensitive to the 
inclusion of large clades with uniform sex-determination systems (snakes and 
birds are all ZW, mammals are all XY) because it remains unchanged when each 
of these clades is reduced to a single datum of its mean ASR (PGLS: b + s.e.m. 
=0.10+0.02, t=5.07, P<0.001, R= 0.232, n=87). Furthermore, our 
result is also robust to between-species differences in sample size: when we added 
log(number of individuals) to the previous model, the effect of sex determination 
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remained significant (b + s.e.m. = 0.15 + 0.07, t = 2.08, P = 0.041), while sample 
size had no significant effect on ASR (b + s.e.m. = 0+ 0.01, t= 0.35, P= 0.72, 
n= 78). Furthermore, sample size was not a significant predictor of ASR when we 
added it as a fourth confounding variable in the full PGLS model (b + s.e.m. = 0 
+ 0.01, t= 1.16, P= 0.250, n = 78), and the effect of other predictors remained 
qualitatively the same as in Table 2. Finally, the results do not change when we only 
used the most reliable ASR data (based on mark-recapture or culling methods): 
sex-determination system is significantly related to ASR in amphibians, reptiles 
tetrapods (PGLS results, amphibians: b+sem. =0.09+0.03, t= 3.07, 
P= 0.004, n = 35 species; reptiles: b + s.e.m. = 0.11 + 0.03, t = 3.974, P< 0.001, 
n = 22; tetrapods with snakes, birds and mammals included as single data points: 
b+s.e.m. = 0.10 + 0.02, t = 4.23, P< 0.001, n = 55). 
Population genetic models. We developed population genetic models of the 
effects that deleterious mutation and sex-antagonistic selection might have on 
the ASR (Supplementary Information 2). The models assume that deleterious 
mutations are largely or entirely recessive, that they have multiplicative fitness 
effects across loci, that the loci are fully sex-linked and in linkage equilibrium, that 
mutation is not sex-biased, and that selection is strong relative to mutation and 
drift. Fitness effects of mutations in hemizygotes and homozygotes are assumed 
equal. Full details of the models are given in Supplementary Information 2. Here 
we summarize the key results. 

When deleterious alleles reach a mutation-selection balance, with XY sex deter- 
mination the mean viability of males relative to females is 


Wm~ exp{ —3Ux — Uy}, 


where Ux and Uy are the total rates of mutation to deleterious alleles across all loci 
on the X and Y chromosomes. With ZW sex determination, the mean viability of 
females to males is 


We~ exp{ —3U,—Uw}, 


where Uz and Uw are the total rates of mutation to deleterious alleles across all loci 
on the Z and W chromosomes. Using very rough estimates for rates of deleterious 
mutations appropriate for human sex chromosomes, we estimate that mutation- 
selection balance might bias the ASR by a few per cent. This degree of bias is 
substantially less than that seen in our data. We emphasize that the conclusion 
could be quite different using other parameter values, or if the model was extended 
to include stochastic effects. 

The second hypothesis to explain biased ASRs that we explored with models is 
sex-antagonistic selection, the situation in which alleles are selected differently in 
females and males. In Supplementary Information 2, we use numerical examples 
to show that under both XY and ZW sex determination, either a female-biased or 
male-biased ASR can result. Thus there does not seem to be a robust generalization 
about how sex-antagonistic selection will bias the ASR. 
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Extended Data Figure 1 | Phylogenetically corrected mean and s.e.m. of 
ASR in clades with different sex-determination systems. Parameter 
estimates for the mean and associated s.e.m. were calculated by PGLS models'® 
presented in Extended Data Table 2 (with branch lengths estimated by Nee’s 
method™). 
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Extended Data Table 1 


Species level 


Detailed analyses of the effect of sex-determination system on the ASR. 


Phylogenetically corrected 


t-test PDM 
t-value __ p-value (n) LR p-value (n) 
Amphibians (XY vs. ZW) 3.039 0.008 (39) 10.5 0.033 (39) 
Reptiles (XY vs. ZW) 6.018 <0.001(67) 11.3 0.023 (67) 
Mammals (XY) vs. birds (ZW) 8.982 <0.001 (238) not tested 


Tetrapods, all species (XY vs. ZW) 9.790 <0.001 (344) 53.6 <0.001 (344) 


Tetrapods, reducedt (XY vs.ZW) 4.801 <0.001(87) 17.9 


0.001 (87) 


PGLS 


t-value p-value (n) 


3.418 0.002 (39) 


5.996 <0.001 (67) 


not tested 


9.313 <0.001 (344) 


5.072 <0.001 (87) 


These are extensions of Table 1 showing details of the phylogenetically uncorrected (t-tests) and phylogenetically corrected (PGLS'® and PDM?*) analyses. Birds and mammals were not tested with phylogenetic 
control because there is no variation in the type of sex-determination system within birds and mammals. In the reduced analysis (marked by +), snakes, birds and mammals were each included as a single datum 


with mean species values. 
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Extended Data Table 2 | Phylogenetically controlled analyses of the relationship between ASR and genetic sex-determination system using 
different branch length assumptions. 


Taxa Branch lengths b+SE t p R? A 
Amphibians Nee's 0.101 + 0.030 3.418 0.002 0.240 0.000 
(n = 39) Pagel's 0.101 + 0.030 3.418 0.002 0.240 0.000 

Unit branch lengths 0.076 + 0.027 2.821 0.008 0.177 0.000 
Reptiles Nee's 0.114+0.019 5.996 < 0.001 0.356 0.000 
(n=67) Pagel's 0.114+0.019 5.968 < 0.001 0.354 0.000 


Unit branch lengths 0.114 + 0.020 5.702 < 0.001 0.333 0.000 
Tetrapods Nee's 0.109 + 0.020 ole < 0.001 0.076 0.409 
(n = 344) Pagel's 0.106 + 0.021 4.998 < 0.001 0.068 0.332 
Unit branch lengths 0.093 + 0.020 4.581 < 0.001 0.058 0.469 


These are the results of PGLS models'® as implemented in the R package ‘caper’®’, showing parameter estimates (b) as the difference in ASR (ZW — XY), the proportion of interspecific variance (R*) in ASR 
explained by the sex-determination system (female-heterogametic, ZW; or male-heterogametic, XY), calculated by PGLS; and the degree of phylogenetic dependence (A). The models assume gradual branch 
lengths calculated either by Nee’s or by Pagel’s method, or unit branch lengths*?. 
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Differential responses to lithium in hyperexcitable 
neurons from patients with bipolar disorder 


Jerome Mertens!*, Qiu-Wen Wang", Yongsung Kim?, Diana X. Yu, Son Pham, Bo Yang", Yi Zheng!, Kenneth E. Diffenderfer’, 
Jian Zhang’, Sheila Soltani*, Tameji Eames?, Simon T. Schafer?, Leah Boyer, Maria C. Marchetto?, John I. Nurnberger®, 

Joseph R. Calabrese®, Ketil J. @degaard’, Michael J. McCarthy®.°, Peter P. Zandi!°, Martin Alba!!, Caroline M. Nievergelt?, 

The Pharmacogenomics of Bipolar Disorder Study+, Shuangli Mi*, Kristen J. Brennand”, John R. Kelsoe®.9, Fred H. Gage? & Jun Yao!?3 


Bipolar disorder is a complex neuropsychiatric disorder that is 
characterized by intermittent episodes of mania and depression; 
without treatment, 15% of patients commit suicide’. Hence, it has 
been ranked by the World Health Organization as a top disorder 
of morbidity and lost productivity”. Previous neuropathological 
studies have revealed a series of alterations in the brains of patients 
with bipolar disorder or animal models®, such as reduced glial 
cell number in the prefrontal cortex of patients*, upregulated 
activities of the protein kinase A and C pathways*’ and changes 
in neurotransmission®-!!. However, the roles and causation 
of these changes in bipolar disorder have been too complex to 
exactly determine the pathology of the disease. Furthermore, 
although some patients show remarkable improvement with 
lithium treatment for yet unknown reasons, others are refractory to 
lithium treatment. Therefore, developing an accurate and powerful 
biological model for bipolar disorder has been a challenge. The 
introduction of induced pluripotent stem-cell (iPSC) technology 
has provided a new approach. Here we have developed an iPSC 
model for human bipolar disorder and investigated the cellular 
phenotypes of hippocampal dentate gyrus-like neurons derived 
from iPSCs of patients with bipolar disorder. Guided by RNA 
sequencing expression profiling, we have detected mitochondrial 
abnormalities in young neurons from patients with bipolar 
disorder by using mitochondrial assays; in addition, using both 
patch-clamp recording and somatic Ca”* imaging, we have 
observed hyperactive action-potential firing. This hyperexcitability 
phenotype of young neurons in bipolar disorder was selectively 
reversed by lithium treatment only in neurons derived from 
patients who also responded to lithium treatment. Therefore, 
hyperexcitability is one early endophenotype of bipolar disorder, 
and our model of iPSCs in this disease might be useful in 
developing new therapies and drugs aimed at its clinical treatment. 

We collected and reprogrammed fibroblasts of six patients with 
manic type I bipolar disorder (BD) and four unaffected individuals 
using recombinant Sendai viral vectors expressing the four Yamanaka 
factors'” (Extended Data Fig. la-c). On the basis of a series of quality 
control examinations, we selected two clones from each individual 
for functional experiments (Extended Data Fig. 1d-j). The 
hippocampus of patients with BD often shows a reduced number of 
neurons!3.14, indicating that hippocampal neurons probably exhibit 
cellular phenotypes of BD. We therefore differentiated the iPSCs 


into hippocampal dentate gyrus (DG) granule cell-like neurons 
using a newly reported protocol'> (Fig. 1a, b). More than 80% of the 
differentiated cells were VGLUT1-positive glutamatergic neurons, 
most of which were DG granule cell-like neurons that could be 
identified by a Prox] promoter-driven lentiviral vector expressing 
enhanced green fluorescent protein (eGFP) (Proxl::eGFP) or 
an anti-Prox1 antibody’; only 2-7% cells were GABAergic (y- 
aminobutyric-acid-releasing) neurons (Fig. 1c, d and Extended 
Data Fig. 2). Normal and BD neurons showed similar densities of 
glutamatergic and GABAergic synapses (Fig. le, f). 

To assess the genetic factors that distinguish patients with BD 
from healthy people, we performed total RNA sequencing (RNA- 
seq) analysis to compare the gene expression profiles between 
3-week-old BD and normal neurons (Fig. 1g). Compared with nor- 
mal neurons, 45 genes were significantly differentially expressed 
in the diseased neurons, with a P value adjusted for false discov- 
ery rate (P.q) of <0.1. Strikingly, we found that the expression of 
multiple mitochondria genes was significantly enhanced in the BD 
neurons (Fig. 1h). Clinical studies have revealed that people with 
mitochondrial cytopathies harbour a high risk of psychiatric disor- 
ders, including BD!®!”. Hence, we investigated the mitochondrial 
function in young DG-like neurons by measuring the mitochondrial 
membrane potential (MMP) using the JC-1 assay (Fig. li). Flow 
cytometry analysis revealed that BD neurons showed increased 
red/green ratios, indicative of enhanced mitochondrial function 
(Fig. 1j, k and Extended Data Fig. 3a), a finding that is in line with 
the upregulated mitochondrial gene expression. We next measured 
the size of neuronal mitochondria, which was represented by the area 
of DsRed2-mito puncta (Fig. 11). Compared with normal neurons, 
the young BD neurons had smaller mitochondria (Fig. 1m, n and 
Extended Data Fig. 3b). It has been suggested that microtubule-based 
transport of mitochondria interacts with their dynamics (fusion/ 
fission; morphology or size) and MMP!®. Moreover, neuronal activity 
is increased with fast mitochondrial transport and vice versa!®. Thus, 
the smaller size and higher MMP of mitochondria in BD neurons 
probably assist their transport, which might lead to enhanced neu- 
ronal activity. 

To explore the possible fold change of the mitochondrial alterations 
in the BD neurons, we expanded our standard of RNA-seq analysis to 
|log»(fold change)|>1 and P< 0.05. We found that 1,005 genes were 
significantly upregulated and 153 genes were downregulated in the 
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Figure 1 | Hippocampal DG granule cell-like neurons derived from 
patients with BD show gene expression and mitochondrial abnormalities. 
a, Schematic: generation of DG-like neurons from BD iPSCs. 

b, Immunostainings of iPSCs for TRA-1-60 and Nanog, neural rosettes 

and neural progenitor cells for SOX2 and Nestin, and neurons for MAP2 
and TUJ1. c, Immunostainings of neurons labelled with VGLUT1, MAP2, 
Proxl::eGFP and GABA. Scale bars, 50 um for b and c. d, Quantification of 
VGLUT1-positive glutamatergic neurons (normal, n = 8; BD, n= 12 lines), 
Prox1::eGFP-positive DG-like neurons (normal, n= 8; BD, n= 12 lines) and 
GABAergic neurons (normal, n = 4; BD, n= 12 lines). e, Immunostaining of 
dendritic glutamatergic synapses and axonal GABAergic synapses. Scale bar, 
5 um. f, Quantification of glutamatergic and GABAergic synapse densities 
(VGLUTI: normal, n= 30 neurons from 8 lines; BD, n= 78 from 12 lines. 


BD neurons compared with controls (Fig. 1g). Kyoto Encyclopedia 
of Genes and Genomes (KEGG) analysis revealed that the Ca?” sig- 
nalling and neuroactive ligand-receptor interaction pathways were 
significantly altered (Supplementary Table 1). Gene ontology (GO) 
analysis suggested that genes involved in the protein kinase A and C 
(PKA/PKC) signalling pathways and the action potential (AP) fir- 
ing system were upregulated (Fig. 2a and Supplementary Tables 2 
and 3). These observations were verified using quantitative reverse 
transcription PCR (qRT-PCR) analysis on representative genes 
(Fig. 2b). Given the facts that enhanced mitochondrial function 
provides an extra energy resource for AP firing and that upregulation 
of the PKA/PKC pathways can enhance AP firing””-”, it is likely that 
AP firing efficiency is altered in BD. 
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GABA: normal, n = 30 from 6 lines; BD, n = 88 from 12 lines). 

g, Heat map of differential gene expression in normal and BD neurons. 

h, Bar graph summarizing differential expression of mitochondrial genes 
in BD and normal neurons. i, Schematic rationale of JC-1. j, JC-1 flow 
cytometry graphs showing that, as a control, CCCP diminishes neuronal 
MMP and that BD neurons have elevated MMP. k, Quantification of 
elevated MMP in BD neurons compared with normal (normal, n=8 lines 
from 4 subjects; BD, n= 12 lines from 6 subjects). 1, m, Neurons expressing 
DsRed2-mito and Prox1::eGFP. Scale bars, 50 um (1) and 20 um (m). 

n, DsRed2-mito puncta sizes reduced in BD neurons. Identical symbols 
indicate same subject (normal, n = 29 cells from 8 lines; BD, n= 39 

from 12 lines). Student's t-test, *P < 0.05; **P< 0.001. Bars, mean +s.e.m. 


We therefore performed patch-clamp recording experiments to 
compare the AP firing patterns of BD and normal iPSC-derived, 
3-week-old Prox1::eGFP-labelled DG-like neurons, which had normal 
synaptic transmission (Fig. 2c, d). Compared with the control neurons, 
BD neurons exhibited greater activation of Na* channels, lower AP 
threshold and greater values of evoked AP number and maximal AP 
amplitude (Fig. 2e-k and Extended Data Fig. 3c-f). Further analysis 
of spontaneous AP firing revealed that the BD neurons showed higher 
AP frequencies (Fig. 2I-n and Extended Data Fig. 3g). These observa- 
tions are consistent with the RNA-seq and qRT-PCR results. Although 
an enhanced expression of Kt channel subunits was also detected, our 
patch-clamp recording results did not show any significant changes 
in the K* currents (Extended Data Fig. 4). Given the fact that Kt 
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Figure 2 | Hippocampal neurons derived 

from patients with BD show hyperexcitability. 
a,b, Average expression of representative 

genes involved in the PKA/PKC and AP firing 
systems revealed by RNA-seq (a) and qRT-PCR 
(b) analysis (normal, n= 4; BD, n=6 lines). 

c-e, Patch-clamp recording on Prox1::eGFP- 
expressing DG-like neurons (c) showed 
spontaneous postsynaptic currents (d) and 
Na*/K* currents (e). Scale bar, 20 pm. f, Average 
peak values of Nat currents during stepwise 
depolarization (normal, n = 40 neurons from 

8 lines; BD, n =52 from 12 lines). g, Normalized 
average Na* currents at different membrane 
potentials. h-k, Sample trace (h), average firing 
threshold (i), average total number (j) and 
maximal amplitude (k) of APs evoked during 
300 ms stepwise depolarization. Identical symbols 
indicate same subject (for AP threshold: normal, 
n= 39 neurons from 8 lines; BD, n =55 from 

12 lines; for total AP number: normal, n = 39 
from 8 lines; BD, n =58 from 12 lines; for 
maximal amplitude: normal, n = 39 from 8 lines; 
BD, n=57 from 12 lines). 1-n, Sample trace 

(1), average frequency (m) and mean amplitude 
(n) of spontaneous APs (for AP frequency: 
normal, n= 29 neurons from 6 lines; BD, 

n= 30 from 8 lines). Student’s t-test, * P < 0.05; 

** P< (0,001. Bars, mean +s.e.m. 
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channel subunits are often overexpressed in epilepsy as a compensa- 
tory response, the upregulated K* channel subunit expression in the 
BD neurons is probably a homeostatic change by which the neurons 
attempt to counteract their hyperactivity. 

To test the suitability of our BD iPSC model for studying new clini- 
cal therapies and drugs, we next set out to investigate the consistency 
of the hyperactivity phenotype shown in the BD patient-derived neu- 
rons with the clinical defects of the patients. Clinically, lithium (Li) has 
been widely used to treat BD mania. In our study, the recruited sub- 
jects included three Li-responsive (LR) and three Li non-responsive 
(NR) patients (Supplementary Table 4). LR and NR patient-derived 
neurons exhibited similar percentages of Proxl-positive DG-like 
cells (Extended Data Fig. 5a, b). Hence, while we were comparing 
the electrophysiological activity of the BD and normal cells, we also 
investigated the effects of Li on the activity of the two subgroups of 
BD neurons in parallel, using 1-week chronic application of 1mM 
LiCl. In 3-week-old neurons derived from LR patients, Li significantly 
reduced Na*/K* currents (Fig. 3a—c), the total number of evoked APs 
(Fig. 3d, e) and the frequency of spontaneous APs (Fig. 3f, g), whereas 
the AP amplitudes and threshold remained unaffected (Extended Data 
Fig. 5c-e). In contrast, Li failed to induce any obvious changes 
in the NR neurons; however, NR neurons could be affected by the 
anti-epileptic drug lamotrigine (Extended Data Fig. 6). These results 
indicated that the hyperactivity of the DG-like neurons that were 
derived from clinically Li-responsive patients could be selectively 
diminished by Li treatment. Therefore, the neuronal hyperactivity 
revealed by our BD iPSC model is directly associated with the clinical 
symptom of mania in the patients with BD. 

To explore the mechanisms that might underlie the Li-caused 
reduction of neuronal activity in the LR neurons, we performed 
RNA-seq analysis of Li-treated neurons. We found that, in the NR 
neurons, 40 genes were changed by the Li treatment; in sharp con- 
trast, 560 genes in the LR neurons were significantly affected, of 


which 238 were upregulated and 322 were downregulated (Fig. 3h, 
i). Hence, Li can specifically affect the gene expression profiles of 
the LR neurons. Further analysis revealed that Li rescued 84 genes 
in the LR neurons to varying degrees, including the gene(s) that are 
probably key for the BD pathology and Li response, and thus could 
potentially be used to develop DNA predictor systems. Of these 
84 genes, those involved in the PKA/PKC pathways and AP firing, 
such as PDE11A, PRKCH, PTPRB and SCN11A, as well as multi- 
ple mitochondria-related genes, were significantly downregulated, 
and the Nat/Kt ATPase pathway gene NKAIN was upregulated 
(Fig. 3), k), indicating the attenuation of the PKA/PKC pathways, AP 
firing system, and mitochondrial functions. Indeed, Li partly res- 
cued mitochondrial dysfunction by increasing the mitochondria size 
in 3-week-old LR neurons, whereas the MMP remained unaffected 
(Fig. 31-n). It thus appears conceivable that Li diminishes hyperactivity 
of the LR neurons through reversing aberrant gene expression related 
to these pathway(s). In addition, we found that the expression of K* 
subunits (KCNA1, KCNJ12) was also significantly downregulated in 
response to the Li treatment (Fig. 3j), probably because of a neuronal 
homeostatic response to the loss of neuronal activity. 

We next tested whether the enhanced excitability of single neu- 
rons could generate neural network hyperactivity through assaying 
somatic Ca*+ transient events with a calcium indicator, Fluo 4-AM. 
Ca?* events were abolished by tetrodotoxin (TTX) (Fig. 4a, b), indicat- 
ing that they represent APs spreading over the neural network”?-”». As 
Prox1-expressing DG-like neurons accounted for approximately 80% 
of all neurons in the culture dish (Fig. 1c, d), Ca2+ imaging of synapsin 
promoter-driven lentiviral vector expressing DsRed (Syn::DsRed)- 
labelled neurons was able to monitor the activity of the granule cells. 
Compared with the normal group, the BD LR and NR neural net- 
works both showed a significantly higher frequency of Ca?* events 
(Fig. 4c—e). In the LR neural network, Li application resulted in a 
remarkable reduction both in the Ca”* event frequency and in the 


5 NOVEMBER 2015 | VOL 527 | NATURE | 97 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


nA) 


LR 


Peak Nat current ( 


NR 


LR 
-LiCl 


+LiCl 
LR 


+LiCl -LiCI 


-LiCl 
-LiCl 


NR 


NR 
Total number of evoked APs 


+LiCl 
+LiCl 


- LiCl + LiCl 
LR (three donors) 


LR + LiCl 
° 
BR 


Fold change of 
gene expression 


0.0 
LiCl 


Log2 (FC) 


o po O& 
Lk § 
we Xt & 
yy 


560 genes 


® 


PS 
vy 


Cc 

9 
1 
es 


NR (three donors) 
a ee { 
Low E33 High 


NR + LiCl 
ON BQO 


40 genes 


. 
Prox1-GFP DsRed2-mito 
Mitochondria size 


Fold change of gene 
expression in LR neurons 


0.0 
LICL-+ -+ -+ -+ -+ -+ -+ 
x F 

S & 
€ 


‘g 
DsRed2-mito 


) 


xR 


JC1 red:green ratio 


percentage of signalling neurons (Fig. 4d, e and Extended Data 
Fig. 3g), whereas the BD NR network was unaffected. Interestingly, 
normal neurons did not show any obvious changes either (Extended 
Data Fig. 7). In addition, we observed that this hyperexcitability would 
reverse when the diseased neurons became old (Extended Data Fig. 8), 
which may represent an early sign of the reported loss of mature hip- 
pocampal neurons in the BD brain and/or might be associated with 
the transition of the patients from mania into depression!*!*, 
Previously, hyperexcitability had been observed in the ventral teg- 
mental area dopaminergic neurons and hippocampal DG neurons of 
BD animal models, and thus was thought to be an endophenotype 
of this disease”°?’. However, it remained unclear whether this phe- 
notype could represent the clinical symptoms of BD. In the present 
study, we generated iPSCs from the fibroblasts of clinically diagnosed 
patients with BD and demonstrated that 3-week-old diseased neu- 
rons derived from iPSCs exhibited significantly upregulated neuronal 
activity. Importantly, we found that treating neurons with Li selec- 
tively diminished this abnormality only in neurons derived from those 
patients who were responsive to clinical Li administration. Notably, 
in a neuronal model of schizophrenia generated by the identical 
approach’>, we did not observe hyperactivity in the diseased neurons 
(data not shown). Therefore, our findings indicated that neuronal 
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Figure 3 | Li rescues hyperexcitability in 
3) x hippocampal neurons derived from iPSCs of 
patients with BD. a, Na*/K* currents recorded 
from BD LR and NR neurons with and without 
Li. b, c, Effects of Li on average peaks of Na* 
currents (b) and K* currents (c) in the LR and NR 
: neurons. Identical symbols indicate same subject 
(LR without Li, n= 26 neurons from 5 lines; with 
Li, n= 19 from 5 lines). d, Representative traces of 
APs evoked during 300 ms stepwise depolarization 
periods. e, Scatter graph showing Li-induced 
decrease in the average total AP number of the LR 
neurons (LR without Li treatment, n = 27 neurons 
from 5 lines; with Li treatment, n = 18 from 
5 lines). f, Representative traces of spontaneous 
APs. g, Spontaneous AP firing frequency in 
Li-treated LR neurons (LR without Li, n=11 
neurons from 3 lines; with Li, n = 10 from 3 lines). 
h, i, Heat maps (h) and MA plots (i) showing 
effects of Li treatment on gene expression in LR 
and NR neurons. j, k, Effects of Li on the average 
expression of representative PKA/PKC/AP (j) and 
mitochondrial genes (k) in the LR neurons (with 
Li, n = 3; without Li, n =3 lines). 1, m, Sample 
images of neurons (1) and bar graph (m) showing 
the effects of Li treatment on mitochondria 
morphology. Scale bar, 10 um (n= 19 neurons 
from 6 lines). n, No effects of Li treatment on 
MMP of the BD neurons (n= 6 lines for each 
group). Student's t-test, *P < 0.05; ** P< 0.001. 
Bars, mean £S.e.m. 
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hyperexcitability is specifically associated with the clinical symptoms 
of patients with BD. 

As indicated earlier, patients with BD differ in response to Li; a 
subset of patients has a very robust response with excellent control 
of symptoms whereas others do not. This variability in response 
leads frequently to many years of trial-and-error efforts to identify 
the optimal medication. Recognition of this differential responsive- 
ness to Li may lead not only to novel treatments but also to DNA or 
other biomarker predictors of response that can accelerate treatment 
optimization and provide precision medicine in psychiatry. Using 
neuronal hyperactivity and Li responsiveness as two indices, we 
detected correlated changes in the PKA/PKC/AP and mitochondria 
genes in the BD neurons, indicating that these pathways might be 
related to neuronal hyperexcitability. Further investigations will be 
necessary to determine whether mitochondrial alterations and/or 
PKA/PKC/AP gene expression changes represent a cause or a conse- 
quence of the observed hyperexcitability phenotype and to whether 
the reversal of hyperexcitability represents further progression of 
the disease. 

In summary, the cell-autonomous findings revealed by our BD 
neuronal model based on iPSC technology represent an important 
first step in understanding the pathophysiology of BD, improving 
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Figure 4 | Somatic Ca”* imaging analysis reveals hyperactivity in the 
neural network formed by the BD iPSC-derived neurons. a, b, Sample 
traces (a) and bar graph (b) showing neuronal Ca?" transients abolished 
by tetrodotoxin (n= 10 images). c, Representative Ca?* traces in normal, 
BD LRand NR neurons. d, Effects of Li treatment on the average ratio of 
neurons exhibiting Ca”* events. Identical symbols indicate the same subject 
(n= 23 images from 6 lines). e, Scatter graph (left) and analysis of variance 
(ANOVA) (right) showing the average Ca?+ event frequencies in normal and 
BD neurons treated with Li (normal, n = 43 images from 8 lines; LR, n= 23 
from 6 lines; NR, n= 23 from 6 lines). Student’s t-test (b) and ANOVA (d, e), 
*P< 0.05; **P< 0.001. Bars, mean +s.e.m. 


diagnosis and perhaps developing novel therapeutics for treatment 
of the disease. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Subjects. Patients with type I BD were participants in one of two prospective clin- 
ical trials of Li monotherapy for identifying genetic variants predictive of good 
Li response. These studies included one of Li response in veterans conducted at 
the University of California, San Diego; the other was the Pharmacogenomics 
of Bipolar Disorder Study (clinical trial number NCT01272531; Supplementary 
Table 5). All subjects provided written informed consent and all procedures were 
approved by local human subjects committees. Subjects were initially screened for 
eligibility and diagnoses determined using the Diagnostic Interview for Genetic 
Studies. Subjects were started on Li. Over 4 months they were titrated up to a 
therapeutic level (1.0 meq dl“) as tolerated, as other psychotropic medications 
were gradually discontinued. Subjects were seen at 2- to 4-week intervals and rated 
for mood symptoms. After 4 months, subjects with a Clinical Global Impressions 
Scale score of 3 or less, reflecting only mild symptoms, were declared responders, 
whereas other subjects were deemed non-responders. Li responders were then 
followed on Li monotherapy for up to 2 years. The responders remained stable for 
an average of 23 months on Li monotherapy, whereas the non-responders failed 
to remit from their index episode after an average trial of Li of 3 months. All 
subjects were white males. The characteristics of these subjects are detailed in 
Supplementary Table 4. Four-millimetre punch skin biopsies were obtained under 
sterile conditions from a few centimetres below the iliac crest. 

iPSC reprogramming and neuron differentiation. BD and normal iPSCs were 
derived from fibroblasts using a Cyto-Tune Sendai reprogramming kit (Invitrogen) 
according to the manufacturer's instructions. All iPSCs were characterized as pre- 
viously described”®. iPSC colonies were cultured on Matrigel-coated dishes (BD 
Biosciences) using mTeSR1 medium (StemCell Technologies). Embryoid bodies 
were formed by mechanical dissociation of iPSC colonies using collagenase and 
plating onto low-adherence dishes in DMEM/F12 (Invitrogen) supplemented with 
N2 and B27. For embryoid body differentiation, floating embryoid bodies were 
treated with DKK1 (0.5 ug ml~!), SB431542 (10 um), noggin (0.5 ug ml~!) and 
cyclopamine (1 tim) for 20 days. To obtain neural progenitor cells, embryoid bodies 
were plated onto polyornithine/laminin (Sigma)-coated dishes in DMEM/F12 
plus N2 and B27. Rosettes were manually collected and dissociated with accutase 
(Chemicon) after 1 week and plated onto laminin-coated dishes in neural pro- 
genitor cell media (DMEM/F12, 1x N2, 1x B27 (Invitrogen), 1 ug ml“? Jaminin 
and 20ng ml“! FGF2 (Invitrogen)). To obtain mature neurons, neural progeni- 
tor cells were differentiated in DMEM/F12 supplemented with 1x N2, 1x B27, 
20ng ml“! BDNF (Peprotech), 1 mM dibutyrl-cyclicAMP (Sigma), 200 nM ascor- 
bic acid (Sigma), 1 ug ml”! Jaminin and 620 ng ml~! Wnt3a (R&D) for 3 weeks. 
Wnt3a was removed after 3 weeks. All cells used in the present study were verified 
as free from mycoplasma contamination. 

Generation of lentivirus. Lentivirus was packaged in 293T HEK cells grown in 
DMEM/F12 (Invitrogen) supplemented with 10% FBS (Gemini). The 293T cells 
cultured in the 15-cm dish were transfected with a solution consisting of 12.2 ug 
lentiviral DNA, 8.1 ug MDL-gagpol, 3.1 ug Rev-RSV, 4.1 ug CMV-VSVG, 500 ul 
of Opti-MEM (Invitrogen) and 110 pl PEI (1 pg ml~!). Medium was changed after 
12h and the virus was harvested at 72h after transfection. 
Immunocytochemistry. Cells were fixed in 4% paraformaldehyde and then per- 
meabilized with 0.25% Triton-X100 in PBS. Cells were then blocked in Tris-Cl 
buffer solution (TBS) containing 0.25% Triton-X100 and 10% donkey serum 
for 1h, followed by incubation with primary antibody overnight at 4°C. After 
three washes with TBS, cells were incubated with secondary antibodies for 1h at 
room temperature. After three washes with TBS, cells were incubated with DAPI 
(0.1 ug ml}, Sigma) for 15 min, followed by three washes with TBS to remove 
DAPI. Fluorescent signals were detected using a Zeiss 710 laser scanning micro- 
scope and images were processed with ZEN 2011, Adobe Photoshop CS5 and 
Image] 1.42 software. The primary antibodies used were mouse anti- TRA-1-60 
monoclonal antibody (1:200, Chemicon catalogue number MAB4360), goat 
anti-Nanog polyclonal antibody (1:200, R&D catalogue number AF1997), goat 
anti-SOX2 polyclonal antibody (1:250, Santa Cruz catalogue number sc-17320), 
mouse anti-Nestin monoclonal antibody (1:200, Chemicon catalogue num- 
ber MAB5326), rabbit anti-TUJ1 polyclonal antibody (1:500, Covance cata- 
logue number PRB-435P), chicken anti-MAP2 polyclonal antibody (1:1,000, 
Abcam catalogue number ab5392), rabbit anti- VGLUT1 polyclonal antibody 
(1:200, Synaptic Systems catalogue number ab5392), rabbit anti-GFP antibody 
(1:500, Invitrogen catalogue number A6455) and rabbit anti-GABA polyclonal 
antibody (1: 1,000, Sigma catalogue number A2052). The secondary anti- 
bodies (Jackson ImmunoResearch Laboratories) used were goat anti-chicken 
Alexa Fluor 647 (1:500, catalogue number 703-605-155), goat anti-rabbit CY3 
(1:500, catalogue number 111-165-003), donkey anti-chicken Dylight 488 
(1:500, catalogue number 703-485-155), donkey anti-rabbit CY3 (1:500, cat- 
alogue number 711-165-152), donkey anti-rabbit Alexa 488 (1:500, catalogue 
number 711-545-152), donkey anti-chicken Dylight 549 (1:500, catalogue 


number 703-505-155), donkey anti-goat Alexa 488 (1:500, catalogue number 
705-545-147), donkey anti-mouse CY3 (1:500, catalogue number 715-165-151), 
donkey anti-goat CY3 (1:500, catalogue number 705-165-147) and donkey anti- 
mouse Alexa 488 (1:500, catalogue number 715-545-151). All relevant information 
about the antibodies used in this study, including citation, clone number and anti- 
body validation profile, can be found at the manufacturers’ websites. 

RNA extraction, PCR and quantitative RT-PCR. Total cellular RNA was 
extracted from approximately 5 x 10° cells using the RNA-BEE (Qiagen) according 
to the manufacturer’s instructions, and reverse transcription was performed using 
a High Capacity cDNA Synthesis kit (AB Biosystems). PCR was performed using a 
GoTAQ PCR kit (Fisher Scientific), and PCR products were analysed using agarose 
gel electrophoresis. Quantitative PCR was done using SyberGreen (Invitrogen), 
and the results were analysed using SDS3.2 software for a 7900HT real-time PCR 
system. The primer sequences used are described in Supplementary Table 6. 
Somatic calcium imaging. Three-week-old neurons derived from BD and normal 
iPSCs were previously infected with a synapsin promoter-driven lentiviral vector 
expressing DsRed (Syn::DsRed). Cell cultures were washed twice with sterile Krebs 
HEPES Buffer and incubated with 3 um Fluo 4-AM (Molecular Probes) in Krebs 
HEPES Buffer for 40 min at room temperature. Excess dye was removed by washing 
twice with Krebs HEPES Buffer, and cells were incubated for an additional 20 min 
to equilibrate the intracellular dye concentration and allow de-esterification. Time- 
lapse image sequences (x 100 magnification) of 3,000 frames were acquired at 
28 Hz with a region of 336 pixels x 256 pixels using a Hamamatsu ORCA-ER digital 
camera (Hamamatsu Photonics) with a 488 nm (FITC (fluorescein isothiocyanate) 
filter on an Olympus IX81 inverted fluorescence confocal microscope (Olympus 
Optical). To assess changes in calcium signalling in response to perturbation of 
neuronal activity, tetrodotoxin (1 um) was applied by bath application. Images were 
acquired with MetaMorph 7.7 software (MDS Analytical Technologies). Images 
were subsequently processed using ImageJ software and custom written routines 
in Matlab 7.2 software (Mathworks). 

Electrophysiology. Neurons were previously infected with the Prox1::eGFP 
lentiviral vector. Whole-cell patch-clamp recordings were performed from 
Prox1::eGFP highlighted DG-like neurons after 3 weeks of differentiation. The 
bath was constantly perfused with an extracellular solution (128 mM NaCl, 5mM 
KCl, 2mM CaCl2, 30 mM glucose, 1mM MgCl2 and 25mM HEPES (pH 7.3)). 
The recording micropipettes (tip resistance 3-6 MQ) were filled with internal 
solution (130mM K-gluconate, 1mM EGTA, 2mM Mg-ATP, 0.3mM Na-GTP, 
5mM Na-phosphocreatine and 10mM HEPES (pH 7.3)). Recordings were made 
using Axopatch 200B or 700B amplifier (Axon Instruments). Signals were filtered 
at 2kHz and sampled at 5 kHz. The series resistance was typically <15 MQ. For 
voltage-clamp recordings, the membrane potential was held at —70 mV. To record 
the sodium and potassium currents, cells were depolarized in 5 mV increments. For 
current-clamp recordings, a hyperpolarized current was injected into the neuron 
to a membrane potential of —55 mV or —45 mV, depending on the experiments. 
Step-depolarized currents with identical parameters were injected into normal and 
BD neurons to elicit APs. All recordings were performed at room temperature and 
chemicals were purchased from Sigma. 

Mitochondrial assay and flow cytometry. To measure mitochondrial size, 
Prox1::eGFP and DsRed2-mito were co-expressed in DG-like neurons via lentiviral 
infection. Neurons were fixed in 4% paraformaldehyde and then permeabilized 
with 0.1% Triton-X100 in TBS. Cells were then blocked in TBS containing 3% 
donkey serum for 1h, followed by incubation with DAPI for 15 min. Fluorescence 
images were acquired using a high-resolution LSM 710 confocal microscope 
(Carl Zeiss) and were processed with ZEN 2011 software (Carl Zeiss) and Adobe 
Photoshop CS5 software (Adobe). The size of the mitochondria (DsRed2-mito 
puncta) was analysed using the Particle Analysis tool in ImageJ software (National 
Institutes of Health). 

For MMP, neurons were incubated with JC-1 dye (Molecular Probes) at 37°C for 
15-30 min”’. The cells were dissociated into single cells using TrypLE (Invitrogen), 
washed three times and then resuspended in 1 ml warm PBS. Green and red flu- 
orescence of JC-1 dye was quantitated using BD FACSCanto II flow cytometer 
(Becton, Dickinson). Histogram plots of green and red fluorescence were created 
to determine the red/green intensity ratio using FlowJo 10 software (TreeStar). 
RNA-seq analysis. RNA was prepared into RNA-Seq libraries using an Illumina 
TruSeq Stranded Total RNA Sample Prep Kit with Ribo-zero Gold (Human/Mouse/ 
Rat) (Illumina). Cytoplasmic and mitochondrial ribosomal RNA was depleted 
using a Ribo-zero Gold component. Depleted RNA was reverse transcribed into 
cDNA using SuperScript II reverse transcriptase (Invitrogen). Stranded cDNA 
sequencing libraries were generated according to Illumina’ procedures. Total RNA- 
Seq libraries were sequenced paired-end 2 x 100 base pairs (bp) using the Illumina 
HiSeq 2500 platform according to the manufacturer's specifications. Low-quality 
ends and read-through adaptor sequences were trimmed using Cutadapt, version 
1.3. The trimmed reads were mapped to the human genome (hg19/GRCh37) using 
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STAR, version 2.3.0. The assignment of reads to gene regions was performed by 
htseq-count, version 0.5.4p5. These raw counts are taken as input for edgeR pack- 
age for differential gene expression analysis using the exact and paired Student's 
t-test described in the edgeR manual. DAVID (http://david.abcc.ncifcrf.gov/) was 
used to perform the gene functional annotation analysis. The categories of GO 
and KEGG pathways were chosen as background databases. All genes of Homo 
sapiens were used as background gene list. The RNA-seq data have been depos- 
ited in NCBI’s Gene Expression Omnibus under accession number GSE58933. 
The Prox1::eGFP-positive DG-like neurons showed gene expression similar to the 
whole differentiation culture (Extended Data Fig. 9). 

Statistical analysis. No statistical methods were used to predetermine sample size. 
The experiments were not randomized. 

For comparisons of Ca** imaging results among the normal, LR and NR groups, 
the difference was assessed using one-way ANOVA followed by Duncan's test; 
the P value was adjusted by Benjamini and Hochberg correction, and an adjusted 
P value <0.05 was considered as significant. For RNA-seq, the data were ana- 
lysed using the edgeR package*”. For pairwise comparisons, we used quantile- 
adjusted conditional maximum likelihood methods. The common dispersion 
was calculated by using the estimateCommonDisp. The exact test is based on 
quantile-adjusted conditional maximum likelihood methods. Knowing the 
conditional distribution of the sum of counts in a group, edgeR computes exact 
P values by summing over all sums of counts that have a probability less than the 
probability under the null hypothesis of the observed sum of counts. Benjamini 
and Hochberg’s algorithm is used to control the false discovery rate. We performed 
paired comparisons to detect gene expression changes in response to Li treatment. 
This is an additive model with the patient as the blocking factor. For all other 
experiments, a two-tailed unpaired Student's t-test was used to determine the statis- 
tical significance of observed differences between various conditions. The analysis 
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approaches have been justified as appropriate by previous biological studies, and all 
data met the criteria of normal distribution. In all experiments, two lines from one 
patient were prepared, and one or two lines were eventually used for the experiment 
depending on the status of the cells, such as differentiation and cell density. For 
most experiments in this study, neurons of all patients and at similar densities were 
investigated (Extended Data Fig. 10), except that, for recordings of spontaneous 
AP firing, two patients with BD LR were investigated. The statistical data for each 
subject are listed in Supplementary Table 7. In the experiments, every cell line had 
a unique code that could not tell the identity of the subject but could tell which two 
lines belonged to the same subject, so that the person performing the experiments 
could use at least one line for each subject without knowing the group category. The 
collected data were used for statistical analysis without exclusion. All experiments 
were performed in technical and biological triplicate and were repeated at least 
three times. The variation within each group of neurons was not pre-estimated 
and the variation between groups might not be similar. For electrophysiological 
recording experiments, at least five to six neurons per subject were recorded and 
statistically analysed without exclusion. For Ca”* imaging and immunostain- 
ing experiments, typically four or five view fields per subject containing several 
hundred neurons were used without exclusion for analysis. For RNA-seq, qRT- 
PCR and flow cytometry experiments, all cells in one culture were collected for 
analysis. 
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Extended Data Figure 1 | Generation of iPSCs from patients with BD 

and healthy people. a, Human fibroblasts generated from punch biopsy. 

b, The iPSC colonies appeared after fibroblasts were reprogrammed using the 
Sendai virus. c, Purified iPSC colonies were cultured in Matrigel-coated plate. 
d, Immunostaining of iPSCs with DAPI and pluripotency markers Nanog 
and TRA- 1-60. e, RT-PCR results showing that the introduced Sendai virus 
genes were cleared from the generated iPSCs. f, RT-PCR results showing 


that the generated iPSCs expressed human pluripotency markers NANOG, 
LIN28, OCT4, TDGF and cMYC. g, Representative karyotyping image of 
generated iPSCs showing normal chromosomal structure. h-j, Bar graphs of 
quantitative RT-PCR showing that the iPSCs can randomly differentiate into 
cells expressing the markers for endoderm, mesoderm and ectoderm. Data 
are representative for a total of 20 iPS cell lines from 10 patients (2 clones per 
patient). Scale bar, 50 um. Bars, mean +s.e.m. 
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Extended Data Figure 2 | Lentiviral transduction of Prox1::eGFP Prox1::eGFP-positive neurons express nuclear Prox1 protein. Normal, 
efficiently labels Prox1-positive DG granule cell-like neurons. a, Sample 92.142.4%, n=4 lines; BD, 93.3 + 1.2%, n= 12 lines. c, Bar graph showing 
immunostaining images showing the expression of Prox] and Prox1::eGFP that, both in the normal and in BD groups, approximately 90% of Prox1- 
in the normal and BD neurons. Scale bar, 100 um. b, Bar graph showing positive DG-like neurons express Prox1::eGFP. Bars, mean + s.e.m. 


that, both in the normal and in BD groups, more than 90% of 
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Extended Data Figure 3 | Bar graphs summarizing the similarity between 
different cell lines of the same subject and comparison of low and high 
passage cells. a, b, Bar graph comparing the MMP (n= 20 lines) (a) and 
mitochondria size (n = 68 images from 20 lines) (b) of different cell lines of 
one subject. c—f, Electrophysiological recording experiments, including peak 
Na* currents (n = 92 neurons from 20 lines) (c), AP threshold (94 neurons 
from 20 lines) (d), total evoked AP number (n = 97 neurons from 20 lines) 
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(e) and maximal AP amplitude (n= 96 neurons from 20 lines) (f). g, Bar 
graph comparing the frequency of Ca”* transient events. Black bar, cell 
line/clone 1; grey bar, cell line/clone 2 (178 videos from 20 lines). h, Bar graph 
showing the normalized peak Na* current in normal (NM) and BD neurons 
derived from <P5 and >P9 cell lines (P5: normal, n= 40 neurons from 

8 lines; BD, n= 52 from 12 lines. P9: normal, n= 11 from 2 lines; BD, n= 23 
from 5 lines). Student’s t-test, *P< 0.05. Bars, mean +s.e.m. 
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Extended Data Figure 4 | K* currents in the BD neurons. a, Average peak 
values of K* currents in the BD and normal neurons. b, Normalized average 
Kt currents at different membrane potentials (normal, n = 35 neurons from 
7 lines; BD, n= 41 from 10 lines). Student’s t-test. Bars, mean + s.e.m. 
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Extended Data Figure 6 | AP firing in the BD NR neurons treated with 
lamotrigine (LTG). a, Representative traces of APs evoked during 300 ms 
stepwise depolarization periods in the normal and NR neurons with and 
without 100 um lamotrigine treatment. b, c, Bar graphs summarizing the 
effects of lamotrigine on the total number (b) and maximal amplitude (c) of 
evoked APs in the normal and BD NR neurons (normal: without lamotrigine, 
n=7 neurons; with lamotrigine, n= 8. BD NR: without lamotrigine, 

n=5; with lamotrigine, n= 6). Student's t-test, *P < 0.05; **P< 0.001. Bars, 
mean + s.e.m. 
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during 300 ms stepwise depolarization periods in the normal neurons treated *P<0.05. Bars, mean+s.e.m. 
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Extended Data Figure 8 | Reversal of hyperexcitability in old BD neurons. 
a, b, Sample traces (a) and scatter graph (b) showing that the 8-week-old BD 
neurons exhibited weaker Na* currents than the normal neurons (normal, 
n= 28 neurons from 4 lines; BD, n = 37 from 6 lines). c, d, Sample traces (c) 
and scatter graph (d) showing that the 8-week-old BD neurons exhibited a 
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lower frequency of Ca’* transient events than the normal neurons (n = 30 
videos from 10 patients). e, Scatter graphs showing the MMP of 6- and 
8-week-old BD and normal neurons (normal, n = 3 lines; BD, n = 3 lines). 
Student’s t-test, *P < 0.05; ** P< 0.001. Bars, mean + s.e.m. 
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cytometry. b, Bar graph showing that Prox1::eGFP expression is enriched 
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Extended Data Figure 10 | Representative icons of the subjects in the 
figures. a, Representative icons of the patients with BD and healthy people 
used in the experiments shown in the figures. Identical symbols indicate the 
same subject. 
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Microenvironment-induced PTEN loss by exosomal 
microRNA primes brain metastasis outgrowth 


Lin Zhang'**, Siyuan Zhang’**, Jun Yao', Frank J. Lowery’, Qingling Zhang’, Wen-Chien Huang’, Ping Li’, Min Li’, Xiao Wang", 
Chenyu Zhang’, Hai Wang', Kenneth Ellis’, Mujeeburahiman Cheerathodi*, Joseph H. McCarty*, Diane Palmieri’, Jodi Saunus®, 
Sunil Lakhani®”"*, Suyun Huang“, Aysegul A. Sahin?, Kenneth D. Aldape’, Patricia S. Steeg? & Dihua Yub?!° 


The development of life-threatening cancer metastases at distant 
organs requires disseminated tumour cells’ adaptation to, and 
co-evolution with, the drastically different microenvironments of 
metastatic sites’. Cancer cells of common origin manifest distinct 
gene expression patterns after metastasizing to different organs’. 
Clearly, the dynamic interaction between metastatic tumour cells 
and extrinsic signals at individual metastatic organ sites critically 
effects the subsequent metastatic outgrowth**. Yet, it is unclear 
when and how disseminated tumour cells acquire the essential 
traits from the microenvironment of metastatic organs that prime 
their subsequent outgrowth. Here we show that both human and 
mouse tumour cells with normal expression of PTEN, an import- 
ant tumour suppressor, lose PTEN expression after dissemination 
to the brain, but not to other organs. The PTEN level in PTEN-loss 
brain metastatic tumour cells is restored after leaving the brain 
microenvironment. This brain microenvironment-dependent, 
reversible PTEN messenger RNA and protein downregulation is 
epigenetically regulated by microRNAs from brain astrocytes. 
Mechanistically, astrocyte-derived exosomes mediate an inter- 
cellular transfer of PTEN-targeting microRNAs to metastatic 
tumour cells, while astrocyte-specific depletion of PTEN-targeting 
microRNAs or blockade of astrocyte exosome secretion rescues the 
PTEN loss and suppresses brain metastasis in vivo. Furthermore, 
this adaptive PTEN loss in brain metastatic tumour cells leads to an 
increased secretion of the chemokine CCL2, which recruits IBA1- 
expressing myeloid cells that reciprocally enhance the outgrowth 
of brain metastatic tumour cells via enhanced proliferation and 
reduced apoptosis. Our findings demonstrate a remarkable plas- 
ticity of PTEN expression in metastatic tumour cells in response 
to different organ microenvironments, underpinning an essential 
role of co-evolution between the metastatic cells and their micro- 
environment during the adaptive metastatic outgrowth. Our find- 
ings signify the dynamic and reciprocal cross-talk between tumour 
cells and the metastatic niche; importantly, they provide new 
opportunities for effective anti-metastasis therapies, especially of 
consequence for brain metastasis patients. 

The remarkable phenotypic plasticity observed in metastasis is 
indicative of co-evolution occurring at specific metastatic organ 
microenvironments”®. To obtain insights into how disseminated 
tumour cells acquire essential traits from metastatic microenviron- 
ments for successful outgrowth, we analysed public gene expression 
profiles of clinical metastases from distinct organs as well as organ- 
specific metastases from mice injected with various cancer cells 
(Extended Data Fig. la-c). Notably, PTEN mRNA was markedly 
downregulated in brain metastases compared to primary tumours or 
other organ metastases. Our immunohistochemistry (IHC) analyses of 


PTEN expression confirmed a significantly higher rate of PTEN 
loss (defined by an immunoreactive score (IRS) of 0-3)’ in brain 
metastases (71%) than in unmatched primary breast cancers (30%) 
(Fig. 1a). PTEN loss was also detected at a significantly higher fre- 
quency in brain metastases (71%) than in matched primary breast 
cancers (37%) of an independent patient cohort (Fig. 1b). 

To test a possible role for PTEN loss in brain metastasis*’, we 
intracarotidly injected PTEN-knockdown tumour cells and assessed 
experimental brain metastasis; unexpectedly, neither incidence nor 
size of brain metastases was increased (Fig. 1c). Furthermore, patients 
with PTEN-normal or PTEN-loss primary tumours had comparable 
levels of brain-metastasis-free survival, and patients with or without 
brain metastases had similar PTEN levels in their primary tumours 
(Extended Data Fig. 1d, e). Thus, the observed PTEN loss in brain 
metastases was unlikely to be derived from PTEN-low primary 
tumours. To investigate whether PTEN loss in brain metastasis is a 
secondary non-genetic event imposed by the brain microenvironment, 
we injected five PTEN-normal breast cancer cell lines either into 
mammary fat pad (MFP) or intracarotidly to induce brain metastasis. 
Notably, the PTEN level was significantly decreased in brain metastases 
compared to the respective MFP tumours or lung metastases (Extended 
Data Fig. 2a, b). We repeated the injections with cells clonally expanded 
from single PTEN-normal tumour cells, and observed similar pheno- 
types (Fig. 1d), suggesting that PTEN-loss brain metastases were not 
selected from pre-existing PTEN-low cells in the primary tumours. 
Surprisingly, established sublines from PTEN-low brain metastases 
(primary Br cells) regained PTEN expression in culture comparable 
to parental cells (Fig. le). Analogously, two in-vivo-selected brain- 
seeking sublines exhibited similar PTEN levels to their matched par- 
ental cells in vitro (Extended Data Fig. 2c). Re-injecting the cultured 
PTEN-normal primary brain sublines conferred a distinct PTEN loss in 
secondary brain metastases, but not in secondary MFP tumours, and 
PTEN levels in secondary brain subline cells were fully restored again in 
culture (Fig. 1f, g and Extended Data Fig. 2d), indicating a reversible 
non-genetic PTEN loss in the brain tumour microenvironment (TME). 

To explore how the brain TME regulates PTEN in metastatic cells’, 
we co-cultured tumour cells with primary glia (>90% astrocytes)", 
cancer-associated fibroblasts (CAFs), or NIH3T3 fibroblasts. Co-culture 
with glia led to a significant decrease of PTEN mRNA and PTEN 
protein (Fig. 2a, b and Extended Data Fig. 2e, f) in all tumour cells, 
but did not affect PTEN promoter methylation or activity (Extended 
Data Fig. 2g, h). This prompted us to examine whether glia reduce 
PTEN mRNA stability through microRNAs (miRNAs). Five miRNAs 
(miR-17, miR-19a, miR-19b, miR-20a and miR-92) in the miR-17~92 
cluster were functionally demonstrated to target PTEN (refs 14-17), 
and Mircl"*!™/] mice have a floxed miR-17~92 allele!®. We 
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Figure 1 | Brain microenvironment-dependent reversible PTEN down- 
regulation in brain metastases. a, Representative IHC staining and histograms 
of PTEN protein levels in primary breast tumours (n = 139) and unmatched 
brain metastases (mets) (m = 131) (Chi-square test, P< 0.001). b, Histograms 
of PTEN protein levels in primary breast tumours and matched brain 
metastases from 35 patients (Chi-square test, P = 0.0211). c, PTEN western 
blots (left) and brain metastasis counts 30 days after intracarotid injection 
(right) of MDA-MB-231Br cells transfected with control or PTEN shRNAs. 
Macromets: >50 [tm in diameter; micromets: =50 Um (mean + s.e.m., 


knocked out the miR-17~92 allele in situ in Mirc1""!!™7 mice 
by intracranial injection of astrocyte-specific Cre adenovirus (Ad- 
GFAP-Cre), then intracarotidly injected syngeneic mouse melanoma 
B16BL6 cells to form brain metastases (Fig. 2c). Astrocyte-specific 


Chi-square test, P = 0.1253). d, PTEN IHC staining of tumours derived from 
clonally expanded PTEN-normal sublines. ICA, intracarotid artery; MFP, 
mammary fat pad. e, Western blot and quantitative reverse transcriptase PCR 
(qRT-PCR) of PTEN expression in the indicated parental (P) and brain- 
seeking (Br) cells under culture (3 biological replicates, with 3 technical 
replicates each). f, Schematic of in vivo re-establishment of secondary (2°) brain 
metastasis, MFP tumour, and their derived cell lines. g, PTEN qRT-PCR 
(mean + s.e.m., t-test, 3 biological replicates, with 3 technical replicates each) 
and PTEN IHC in HCC1954Br secondary tumours and cultured cells. 


depletion of PTEN-targeting miRNAs blocked PTEN downregulation 
(Fig. 2d) in the brain metastasis tumour cells in vivo without signifi- 
cantly altering other potential miRNA targets (Extended Data Fig. 3a), 
and significantly suppressed brain metastasis growth compared to the 
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Figure 2 | Astrocyte-derived miRNAs silence PTEN in tumour cells. 

a, PTEN mRNA in the indicated tumour cells after 2-5 days co-culture with 
GFAP-positive primary glia or vimentin (vim)-positive primary CAFs or 
NIH3T3 fibroblasts (mean + s.e.m., t-test, 3 biological replicates, with 3 
technical replicates each). b, Western blot of PTEN protein under co-culture as 
in a (3 biological replicates). c, Schematic of astrocyte-specific miR-17~92. 
deletion by GEAP-driven Cre adenovirus (Ad-GFAP-Cre) in Mirc1°”"™™' 7] 
mice. d, Representative image of tumour sizes and PTEN IHC of brain 


metastases. e, Quantification of brain metastases volume (mean = s.d., t-test, 
P= 0.0024). f, PTEN 3'-UTR luciferase activity after co-culture (mean + s.e.m., 
t-test, 3 biological replicates, with 3 technical replicates each). bp, base pairs. 
g, RT-PCR analyses of miR-19a and PTEN mRNA in MDA-MB-231 cells 
after 48 h co-culture with primary astrocytes from Mirc1'"’'”//] mice pre- 
infected (48 h) by adenovirus (Ad-BGLuc or Ad-GFP-Cre) (mean = s.e.m., 
t-test, P< 0.001, 3 biological replicates, with 3 technical replicates each). 

h, Western blot of PTEN protein in MDA-MB-231 cells, co-cultured as in g. 
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control group (Fig. 2d, e), indicating a tumour cell non-autonomous 
PTEN downregulation by astrocyte-derived PTEN-targeting miRNAs. 
Astrocyte-specific depletion of PTEN-targeting miRNAs also sup- 
pressed intracranially injected tumour cell outgrowth (Extended 
Data Fig. 3b-f). To examine which PTEN-targeting miRNA primarily 
mediates the PTEN loss in tumour cells when co-cultured with astro- 
cytes, the luciferase activities of the wild-type and mutated PTEN 
3'-untranslated region (UTR) (containing various miRNA binding site 
mutations) in tumour cells were assessed (Fig. 2f). Compared with CAF 
co-culture, astrocyte co-culture inhibited luciferase activity of wild-type 
PTEN 3'-UTR, which was rescued by the miR-19a binding site muta- 
tion (position 1), but not by other mutations, indicating the major role 
of miR-19a in astrocyte-mediated PTEN mRNA downregulation in 
tumour cells. Furthermore, PTEN mRNA (Fig. 2g and Extended 
Data Fig. 3g) and PTEN protein (Fig. 2h and Extended Data Fig. 3h) 
were not downregulated in tumour cells co-cultured with primary 
astrocytes from Mirc1""’"™/7 mice in which PTEN-targeting 
miRNAs were depleted (Extended Data Fig. 33). 

After co-culture with Cy3-labelled miR-19a-transfected primary 
astrocytes, we detected significantly more Cy3* epithelial cell adhesion 
molecule (EpCAM)-positive tumour cells over time than under CAF 
co-culture (Fig. 3a and Extended Data Fig. 4a), suggesting that miR-19a 
is intercellularly transferred from astrocytes to tumour cells. miRNAs 
are transferable between neighbouring cells through gap junctions or 
small vesicles'*”°. Treating tumour cells with a gap junction channel 
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inhibitor, carbenoxolone disodium salt, had no significant effect on 
miR-19a intercellular transfer (data not shown), while adding astro- 
cyte-conditioned media to tumour cells led to an increase in miR-19a 
levels and a subsequent PTEN downregulation (Extended Data 
Fig. 4b-d). Recognizing the involvement of exosomes in neuronal func- 
tion and glioma development”, we postulated that exosomes may 
mediate miR-19a transfer from astrocytes to tumour cells. Indeed, 
transmission electron microscopy detected spherical, membrane- 
encapsulated particles between 30 and 100nm, typical of exosome 
vesicles, in astrocyte-conditioned media” (Fig. 3b). Additionally, the 
astrocyte-conditioned media contained significantly more CD63", 
CD81~* and TSG101* exosomes” than the CAF-conditioned media 
(Fig. 3c and Extended Data Fig. 4e, f). Moreover, the exosomes from 
astrocytes contained 3.5-fold higher levels of miR-19a than those from 
CAFs (Extended Data Fig. 4g). Adding exosomes purified from condi- 
tioned media of Cy3-miR-19a-transfected astrocytes led to miR-19a 
transfer into cultured tumour cells (Fig. 3d). Furthermore, treating 
tumour cells directly with astrocyte-derived exosomes led to a dose- 
dependent increase of miR-19a and a subsequent decrease of PTEN 
mRNA in tumour cells (Fig. 3e). To determine whether astrocyte- 
released exosomes are required for miR-19a transfer, we blocked 
astrocyte exosome secretion by treating astrocytes with either an inhib- 
itor of exosome release, dimethyl amiloride (DMA), or a short inter- 
fering RNA (siRNA) targeting Rab27a, a mediator of exosome 
secretion” (Extended Data Fig. 5a-c). Both exosome blockades 
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Figure 3 | Intercellular transfer of PTEN-targeting miR-19a to tumour cells 
via astrocyte-derived exosomes. a, Intercellular transfer of miR-19a. Top, 
light microscopy and fluorescent images of HCC1954 cells 12 and 60h after 
co-culture with astrocytes loaded with Cy3-labelled miR-19a. Bottom, flow 
cytometry analysis of Cy3-miR-19a in tumour cells 60 h after co-culture 
(mean + s.e.m., t-test, P< 0.05, 3 biological replicates). b, c, Transmission 
electron microscopy of exosome vesicles in astrocyte-conditioned media (b), 
confirmed by western blot for CD63, CD81 and TSG101 exosome markers 
released by 1 X 10° CAFs or astrocytes (c). d, Representative data showing 
presence of Cy3-miR-19a in HCC1954 breast cancer cells after adding 
exosomes purified from Cy3-miR-19a-transfected astrocytes for 24h. Bottom, 
flow cytometry analysis of Cy3-miR-19a-positive HCC1954 cells after 
treatment with supernatant (without exosomes), or exosomes purified from 
Cy3-miR-19a-transfected astrocytes. Negative control is HCC1954 cells 
without treatment. Positive control is Cy3-miR-19a-transfected astrocytes 
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(3 biological replicates). e, Histogram of miR-19a and PTEN mRNA in 
HCC1954 cells 48 h after addition of media, astrocyte supernatant, or exosomes 
purified from astrocyte-conditioned media (mean + s.e.m., t-test, 3 biological 
replicates, with 3 technical replicates each). f, g, Histograms of miR-19a and 
PTEN mRNA in HCC1954 cells after 48 h co-culture in conditioned media 
from vehicle- or DMA-treated (4h) astrocytes (f) and control- or Rab27a- 
siRNA-transfected (48 h) astrocytes (g) (mean ~ s.e.m., f-test, 3 biological 
replicates, with 3 technical replicates each). h-j, Schematics of in vivo 
experiments (h), IHC analyses of PTEN and exosome marker expression (i) 
and changes of tumour volume (j) (mean = s.d., t-test, n = 7, P = 0.0157). 

B, brain; M, metastases; shRab27a/b, shRNA against Rab27a/b. k-m, Schematics 
showing in vivo rescue of exosome effect by pre-incubation of tumour cells 
with astrocyte-derived exosomes (k), IHC analyses of PTEN and exosome 
marker expression (1) and changes of tumour volume (m) (mean + s.d., t-test, 
n= 8, P=0.0091). 
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decreased the transfer of miR-19a from astrocytes to tumour cells and 
restored the PTEN mRNA levels (Fig. 3f, g). Furthermore, we intra- 
cranially injected Rab27a/b short hairpin RNA (shRNA) lentiviruses to 
block exosome secretion in mouse brain parenchyma (brain metastasis 
stroma), and then inoculated B16BL6 melanoma cells to the same sites 
(Fig. 3h). Inhibiting Rab27a/b reduced TSG101* and CD63* exo- 
somes, blocked PTEN downregulation in tumour lesions (Fig. 3i and 
Extended Data Fig. 5d-g), and significantly decreased tumour out- 
growth (Fig. 3j). Conversely, intracranial co-injection of tumour cells 
with astrocyte-derived exosomes (Fig. 3k) rescued PTEN downregula- 
tion in tumour cells (Fig. 31) and metastatic outgrowth (Fig. 3m) in 
mouse brains injected with Rab27a/b shRNA (Extended Data Fig. 5h, i). 
Collectively, exosome-mediated miR-19a transfer from astrocytes to 
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tumour cells is critical for tumour PTEN downregulation and aggressive 
outgrowth in the brain. 

We next explored how PTEN loss promotes brain metastasis. 
Doxycycline-inducible PTEN knockdown (Extended Data Fig. 6a) 
before intracarotid injection did not alter tumour cell extravasation 
into the brain parenchyma (Extended Data Fig. 6b, c). To test whether 
restoring PTEN expression after tumour cell extravasation inhibits 
metastatic outgrowth, we selected subclones of human breast carcin- 
oma cells that selectively metastasize to the brain (MDA-MB-231Br) 
stably expressing either a doxycycline-inducible PTEN-coding 
sequence without the 3'’-UTR miRNA binding sites, or red fluorescent 
protein (RFP) controls (Fig. 4a and Extended Data Fig. 6d). PTEN 
induction 7 days post-intracarotid injection after extravasation of 
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Figure 4 | Brain-dependent PTEN loss instigates metastatic microenviron- 
ment to promote metastatic cell outgrowth. a, Prolonged mouse survival 
by restoration of PTEN expression. Top, doxycycline (Dox)-inducible RFP 
(left) and PTEN expression (right) in 231Br cells. Middle, schematic of brain 
metastasis assay with doxycycline-induced RFP or PTEN expression. Bottom, 
overall survival of mice bearing brain metastases of 231Br cells with induced 
PTEN re-expression or RFP expression (log-rank test, n = 12, P< 0.0001). 

b, Cytokine array of 231Br cells with doxycycline-induced RFP or PTEN 
expression. c, Overall survival of mice bearing brain metastases of 231Br cells 
transfected with control or CCL2 shRNAs (shControl or shCCL2, respectively) 
(log-rank test, n = 8, P = 0.027). d, Western blot analysis of NF-«B p65 
nuclear translocation after knocking down PTEN. Cyto, cytosol; nuc, nuclear. 
e, Histogram showing CCL2 mRNA levels detected by quantitative PCR after 
PTEN knockdown with shRNA (shPTEN) (mean + s.e.m., t-test, P< 0.001, 
3 biological replicates, with 3 technical replicates each). f, Light and fluorescent 
microscopy images and quantification of mCherry-labelled tumour cells 


shControl shCCL2 


with or without BV2 microglia co-culture under 2-day serum starvation 
(mean + s.e.m., t-test, P = 0.031, 3 biological replicates, with 3 technical 
replicates each). g, FACS analyses of Annexin V* apoptotic zsGreen-labelled 
231Br cells under doxorubicin treatment with or without BV2 microglia 
co-culture (mean + s.e.m., t-test, P = 0.004, 3 biological replicates). 

h, Immunofluorescence staining of IBA1* myeloid cells in brain metastases 
of 231Br cells containing control (shControl) or CCL2 shRNA (shCCL2) 
(mean + s.e.m., t-test, P< 0.01, 3 biological replicates, with 3 technical 
replicates each). i, j, IHC analyses showing decreased proliferation (Ki-67, i) 
and increased apoptosis (TUNEL staining, j) in brain metastases after shRNA- 
mediated CCL2 knockdown in vivo (mean + s.e.m., t-test). k, PTEN and 
CCL2 expression in matched primary breast tumours and brain metastases. 
Left, representative IHC staining of PTEN and CCL2. Right, quantification of 
PTEN and CCL2 expression in 35 cases of matched primary breast tumours 
and brain metastases (mean + s.d.). 
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tumour cells markedly extended the overall survival of brain meta- 
stases-bearing mice (Fig. 4a and Extended Data Fig. 6e). Collectively, 
PTEN loss primes brain metastasis outgrowth after tumour cell extra- 
vasation and PTEN restoration suppresses the outgrowth. 

Autocrine and paracrine signalling have decisive roles in metastasis 
seeding and outgrowth. Although PTEN restoration only led to a trend 
of reduced Akt and P70S6K phosphorylation (pAkt and pP70S6K, 
respectively; Extended Data Fig. 6f), cytokine array analyses revealed 
markedly reduced CCL2 secretion in PTEN-expressing tumour cells 
compared to controls (Fig. 4b); whereas PTEN knockdown increased 
CCL2 expression (Extended Data Fig. 6g). Moreover, the overall sur- 
vival of brain metastasis-bearing mice with CCL2-knockdown MDA- 
MB-231Br cells was significantly extended compared to controls 
(Fig. 4c and Extended Data Fig. 6h, i). Mechanistically, PTEN induc- 
tion decreased NF-kB p65 phosphorylation (Extended Data Fig. 7a, b) 
along with reduced CCL2 secretion (Fig. 4b), whereas PTEN knock- 
down increased p65 nuclear translocation, an indicator of NF-«B 
activation, and CCL2 expression (Fig. 4d, e), partly through Akt activa- 
tion (Extended Data Fig. 7c). Furthermore, CCL2 mRNA and CCL2 
protein expression in brain-seeking tumour cells was inhibited by the 
NF-«B inhibitor pyrrolidine dithiocarbamate (PDTC) (Extended Data 
Fig. 7d-f), indicating that NF-«B activation is crucial for PTEN-loss- 
induced CCL2 upregulation. 

CCL2 is a chemo-attractant during inflammation”. CCL2 receptor 
(CCR2)-expressing brain-derived IBA1-positive (IBA1~) primary 
myeloid cells and BV2 microglial cells (Extended Data Fig. 8a, b) 
migrate towards CCL2, which was blocked by CCR2 antagonists” 
(Extended Data Fig. 8c, d). Functionally, co-culturing with BV2 cells 
enhanced proliferation and inhibited apoptosis of breast cancer cells 
(Fig. 4f, g). In vivo, CCL2-knockdown brain metastases had decreased 
IBA1*/CCR2* myeloid cell infiltration (Fig. 4h), corresponding to 
their reduced proliferation and increased apoptosis (Fig. 4i, j). 
Furthermore, IHC staining of human primary breast tumours and 
matched brain metastases for PTEN and CCL2 (Figs 1b and 4k, 
respectively) revealed a significantly (P = 0.027) higher CCL2 express- 
ion in brain metastases than in primary tumours (Extended Data 
Fig. 9a). Importantly, severe PTEN loss in brain metastases corre- 
sponded to higher CCL2 expression (Extended Data Fig. 9b), 
which significantly correlated with IBA1* myeloid cell recruitment 
(Extended Data Fig. 9c), validating that PTEN downregulation in 
brain metastatic tumour cells contributes to CCL2 upregulation and 
IBA1* myeloid cell recruitment in clinical brain metastases. 

Taken together, our data unveiled a complex reciprocal commun- 
ication between metastatic tumour cells and their TME, which primes 
the successful outgrowth of cancer cells to form life-threatening meta- 
stases (Extended Data Fig. 10). Beyond a tumour cell autonomous view 
of metastasis, our findings highlighted an important plastic and tissue- 
dependent nature of metastatic tumour cells, and a bi-directional 
co-evolutionary view of the ‘seed and soil’ hypothesis. Notably, 
although clinical application of CCL2 inhibitor for metastasis treat- 
ment requires careful design”, our data of brain metastasis inhibition 
by stable ablation of PTEN-loss-induced CCL2 demonstrated the 
potential of CCL2-targeting for therapeutic intervention of life- 
threatening brain metastases. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references unique 
to these sections appear only in the online paper. 
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METHODS 


Reagents and cell culture. All common chemicals were from Sigma. Pyrro- 
lidinedithiocarbamic acid was from Santa Cruz Biotechnology. Exo-FBS 
exosome-depleted FBS was purchased from System Biosciences (SBI). PTEN 
(9188), pAkt(T308) (9275), pAkt(S473) (4060), Pan Akt (4691), and Bim (2933) 
antibodies were from Cell Signaling. CD9 (ab92726), Rab27a (ab55667), AMPK 
(ab3759), CCL2 (ab9899), MAP2 (ab11267), and pP70S6K (ab60948) antibodies 
were from Abcam. Tsgl01 (14497-1-AP) and Rab27b (13412-1-AP) antibodies 
were from Proteintech. CD81 (104901) antibody was from BioLegend. E2F1 
(NB600-210) and CCR2 (NBP1-48338) antibodies were from Novus. GFAP 
(Z0334) antibody was from DAKO. IBA1 antibody was from WAKO. Cre 
(969050) antibody was from Novagen. NF-KB p65 (SC-109) and CD63 (SC- 
15363) antibodies were from Santa Cruz. DMA (sc-202459) and CCR2 antagonist 
(sc-202525) were from Santa Cruz. MK2206 (S1078) was from Selleckchem. 
PDTC (P8765) was from Sigma-Aldrich. Human breast cancer cell lines (MDA- 
MB-231, HCC1954, BT474 and MDA-MB-435) and mouse cell lines (B16BL6 
mouse melanoma and 4T1 mouse breast cancer) were purchased from ATCC and 
verified by the MD Anderson Cancer Center (MDACC) Cell Line Character- 
ization Core Facility. All cell lines have been tested for mycoplasma contamina- 
tion. Primary glia was isolated as described’’. In brief, after homogenization of 
dissected brain from postnatal day (P)0-P2 neonatal mouse pups, all cells were 
seeded on poly-p-lysine coated flasks. After 7 days, flasks with primary culture 
were placed on an orbital shaker and shaken at 230 r.p.m. for 3h. Warm DMEM 
10:10:1 (10% of fetal bovine serum, 10% of horse serum, 1% penicillin/streptomy- 
cin) was added and flasks were shaken again at 260 r.p.m. overnight. After shaking, 
fresh trypsin was added into the flask and leftover cells were plated with warm 
DMEM 5:5:1 (5% of fetal bovine serum, 5% of horse serum, 1% penicillin/strep- 
tomycin) to establish primary astrocyte culture. More than 90% of isolated prim- 
ary glial cells were GFAP” astrocytes. Primary CAFs were isolated by digesting the 
mammary tumours from MMTV-neu transgenic mouse. 231-xenograft CAFs 
were isolated by digesting the mammary tumours from MDA-MB-231 xenograft. 
For the mixed co-culture experiments, tumour cells were mixed with an equal 
number of freshly isolated primary glia, CAFs or NIH3T3 fibroblast cells in six- 
well plate (1:3 ratio). Co-cultures were maintained for 2-5 days before magnetic- 
bead-based separation. For the trans-well co-culture experiments, tumour cells 
were seeded in the bottom well and freshly isolated primary glia, CAFs or NIH3T3 
cells were seeded on the upper insert (1:3 ratio). Co-cultures were maintained for 
2-5 days for the further experiments. Lentiviral-based packaging vectors 
(Addgene), pLKO.1 PTEN-targeting shRNAs and all siRNAs (Sigma), Human 
Cytokine Antibody Array 3 (Ray biotech), and lentiviral-based vector pTRIPZ- 
PTEN and pTRIPZ-CCL2 shRNAs (MDACC shRNA and ORFome Core, from 
Open Biosystems) were purchased. The human PTEN-targeting shRNA sequences 
in the lentiviral constructs were: 5'-CCGGAGGCGCTATGTGTATTATT 
ATCTCGAGATAATAATACACATAGCGCCTTTTTT-3’ (targeting coding 
sequence); 5’-CCGGCCACAAATGAAGGGATATAAACTCGAGTTTATAT 
CCCTTCATTTGTGGTTTTT-3’ (targeting 3’-UTR). The human PTEN- 
targeting siRNA sequences used were: 5'-GGUGUAAUGAUAUGUGCAU-3’ 
and 5'-GUUAAAGAAUCAUCUGGAU-3’. The human CCL2-targeting siRNA 
sequences used were: 5’-CAGCAAGUGUCCCAAAGAA-3’ and 5’-CCGAAGA 
CUUGAACACUCA-3’. The mouse Rab27a-targeting siRNA sequences used 
were: 5’-CGAUUGAGAUGCUCCUGGA-3’ and 5’-GUCAUUUAGGGAUCC 
AAGA-3'. Mouse pLKO shRNA (shRab27a: TRCN0000381753; shRab27b: 
TRCN0000100429) were purchased from Sigma. For lentiviral production, len- 
tiviral expression vector was co-transfected with the third-generation lentivirus 
packing vectors into 293T cells using Lipo293 DNA in vitro Transfection Reagent 
(SignaGen). Then, 48-72 h after transfection, cancer cell lines were stably infected 
with viral particles. Transient transfection with siRNA was performed using 
pepMute siRNA transfection reagent (SignaGen). For in vivo intracranial virus 
injection, lentivirus was collected from 15cm plates 48h after transfection of 
packaging vectors. After passing a 0.45 jm filter, all viruses were centrifuged at 
25,000 r.p.m (111,000g) for 90 min at 4°C. Viral pellet was suspended in PBS 
(~200-fold concentrated). The final virus titre (~1 X 10? UT ml~!) was con- 
firmed by limiting dilution. 

Isolation of tumour cells from co-culture. Cell isolation was performed based on 
the magnetic bead-based cell sorting protocol according to manufacturer’s recom- 
mendation (Miltenyi Biotec Inc.). After preparation of a single-cell suspension, 
tumour cells (HCC1954 or BT474) were stained with primary EpCAM-FITC 
antibody (130-098-113) (50 ul per 10’ total cells) and incubated for 30 min in 
the dark at 4°C. After washing, the cell pellet was re-suspended and anti-FITC 
microbeads (50 ull per 10’ total cells) were added before loading onto the magnetic 
column of a MACS separator. The column was washed twice and removed from 
the separator. The magnetically captured cells were flushed out immediately by 
firmly applying the plunger. The isolated and labelled cells were analysed on a 
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Gallios flow cytometer (Beckman Coulter). For EpCAM-negative MDA-MB-231 
tumour cells, FACS sorting (ARIAII, Becton Dickinson) was used to isolate green 
fluorescent protein (GEP)* tumour cells from glia or CAFs. 

Isolation of CD11b* cells from mouse primary glia. Isolation of primary glia 
was achieved by homogenization of dissected brain from PO-P2 mouse pups. After 
7 days, trypsin was added and cells were collected. After centrifugation and re- 
suspension of cell pellet to a single-cell suspension, cells were incubated with 
CD11b* microbeads (Miltenyl Biotec) (50 ul per 10” total cells) for 30 min at 
4°C. The cells were washed with buffer and CD11b~ cells were isolated by 
MACS Column. CD11b* cells were analysed by flow cytometry and immuno- 
fluorescence staining. 

Western blotting. Western blotting was done as previously described. In brief, 
cells were lysed in lysis buffer (20 mM Tris, pH 7.0, 1% Triton X-100, 0.5% NP-40, 
250 mM NaCl, 3mM EDTA and protease inhibitor cocktail). Proteins were sepa- 
rated by SDS-PAGE and transferred onto a nitrocellulose membrane. After mem- 
branes were blocked with 5% milk for 30 min, they were probed with various 
primary antibodies overnight at 4°C, followed by incubation with secondary 
antibodies for 1h at room temperature, and visualized with enhanced chemi- 
luminescence reagent (Thermo Scientific). 

qRT-PCR. In brief, total RNA was isolated using miRNeasy Mini Kit (Qiagen) 
and then reverse transcribed using reverse transcriptase kits (iScript cDNA syn- 
thesis Kit, Bio-rad). SYBR-based qRT-PCR was performed using pre-designed 
primers (Life Technologies). miRNA assay was conducted using Taqman miRNA 
assay kit (Life Technologies). For quantification of gene expression, real-time PCR 
was conducted using Kapa Probe Fast Universal qPCR, and SYBR Fast Universal 
qPCR Master Mix (Kapa Biosystems) on a StepOnePlus real-time PCR system 
(Applied Biosystems). The relative expression of mRNAs was quantified by 
2744 with logarithm transformation. Primers used in qRT-PCR analyses are: 
mouse Ccl2: forward, 5'-GTTGGCTCAGCCAGATGCA-3’; reverse: 5’-AGCCT 
ACTCATTGGGATCATCTTG-3’. Mouse Actb: forward: 5'-AGTGTGACGT 
TGACATCCGT3’; reverse: 5’-TGCTAGGAGCCAGAGCAGTA-3’. Mouse Pten: 
forward: 5'-AACTTGCAATCCTCAGTTTG-3’; reverse: 5'-CTACTTTGATATC 
ACCACACAC-3’. Mouse Ccr2 primer: Cat: 4351372 ID: Mm04207877_m1 (Life 
technologies) 

miRNA labelling and transfection. Synthetic miRNAs were purchased from 
Sigma and labelled with Cy3 by Silencer siRNA labelling kit (Life Technologies). 
In brief, miRNAs were incubated with labelling reagent for 1 h at 37 °C in the dark, 
and then labelled miRNAs were precipitated by ethanol. Labelled miRNAs (100 
pmoles) were transfected into astrocytes or CAFs in a 10-cm plate. After 48h, 
astrocytes and CAFs containing Cy3-miRNAs were co-cultured with tumour cells 
(at 5:1 ratio). 

PTEN promoter methylation analysis and luciferase reporter assay of PTEN 
promoter activity. Genomic DNA was isolated by PreLink genomic DNA mini 
Kit (Invitrogen), bisulfite conversion was performed by EpiTect Bisulphite Kit and 
followed by EpiTect methylation-specific PCR (Qiagen). Primers for PTEN CpG 
island are 5’-TGTAAAACGACGGCCAGTTTGTTATTATTTTTAGGGTTGG 
GAA-3’ and 5'-CAGGAAACAGCTATGACCCTAAACCTACTTCTCCTCAA 
CAACC-3’. Luciferase reporter assays were done as previously described’’. The 
wild-type PTEN promoter driven pGL3-luciferase reporter was a gift from 
A. Yung. The pGL3-PTEN reporter and a control Renilla luciferase vector were 
co-transfected into tumour cells by Lipofectamine 2000 (Life Technologies). After 
48h, tumour cells were co-cultured with astrocytes or CAFs. Another 48 h later, 
luciferase activities were measured by Dual-Luciferase Report Assay Kit 
(Promega) on Luminometer 20/20 (Turner Biosystems). The PTEN 3'-UTRs with 
various miRNA binding-site mutations were generated by standard PCR- 
mediated mutagenesis method and inserted downstream of luciferase reporter 
gene in pGL3 vector. The activities of the luciferase reporter with the wild-type 
and mutated PTEN 3’-UTRs were assayed as described above. 

Exosome isolation and purification. Astrocytes or CAFs were cultured for 48- 
72h and exosomes were collected from their culture media after sequential ultra- 
centrifugation as described previously. In brief, cells were collected, centrifuged at 
300g for 10 min, and the supernatants were collected for centrifugation at 2,000g 
for 10 min, 10,000g for 30 min. The pellet was washed once with PBS and purified 
by centrifugation at 100,000g for 70 min. The final pellet containing exosomes was 
re-suspended in PBS and used for (1) transmission electron microscopy by fixing 
exosomes with 2% glutaraldehyde in 0.1 M phosophate buffer, pH 7.4; (2) measure 
of total exosome protein content using BCA Protein Assay normalized by equal 
number of primary astrocytes and CAF cells; (3) western blotting of exosome 
marker protein CD63, CD81 and Tsgl01; and (4) qRT-PCR by extracting 
miRNAs with miRNeasy Mini Kit (Qiagen). 

Transmission electron microscopy. Fixed samples were placed on 100-mesh 
carbon-coated, formvar-coated nickel grids treated with poly-.-lysine for about 
30min. After washing the samples on several drops of PBS, samples were 
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incubated on drops of buffered 1% gluteraldehyde for 5 min, and then washed 
several times on drops of distilled water. Afterwards, samples were negatively 
stained on drops of millipore-filtered aqueous 4% uranyl acetate for 5 min. Stain 
was blotted dry from the grids with filter paper and samples were allowed to dry. 
Samples were then examined in a JEM 1010 transmission electron microscope 
(JEOL) at an accelerating voltage of 80 Kv. Digital images were obtained using the 
AMT Imaging System (Advanced Microscopy Techniques Corp.). 

Flow cytometry analysis of exosome marker proteins, Annexin V and CCR2. 
For exosome detection, 100 pl exosomes isolated from 10-ml conditioned media 
of astrocytes or CAFs were incubated with 10 il of aldehyde/sulfate latex beads 
(41m diameter, Life Technologies) for 15min at 4°C. After 15min, PBS was 
added to make sample volume up to 400 ul, which was incubated overnight at 
4°C under gentle agitation. Exosome-coated beads were washed twice in FACS 
washing buffer (1% BSA and 0.1% NaN; in PBS), and re-suspended in 400 pl 
FACS washing buffer, stained with 4 ug of phycoerythrin (PE)-conjugated anti- 
mouse CD63 antibody (BioLegend) or mouse IgG (Santa Cruz Biotechnology) for 
3 hat 4 °C under gentle agitation and analysed on a FACS Canto II flow cytometer. 
Samples were gated on bead singlets based on FCS and SSC characteristics 
(41m diameter). For Annexin V apoptosis assay, after 24h doxorubicin (2 1M) 
treatment, the cells were collected, labelled by APC-Annexin V antibody 
(Biolegend) and analysed on a FACS Canto II flow cytometer. CD11b* and 
BV2 cells were stained with CCR2 antibody (Novus) at 4°C overnight; they were 
then washed and stained with Alexa Fluor 488 anti-rabbit IgG (Life Technologies) 
at room temperature for 1h. The cells were then analysed on a FACS Canto II 
flow cytometer. 

In vivo experiments. All animal experiments and terminal endpoints were carried 
out in accordance with approved protocols from the Institutional Animal Care 
and Use Committee of the MDACC. Animal numbers of each group were calcu- 
lated by power analysis and animals are grouped randomly for each experiment. 
No blinding of experiment groups was conducted. MFP tumours were established 
by injection of 5 X 10° tumour cells in 100 pl of PBS:Matrigel mixture (1:1 ratio) 
orthotopically into the MFP of 8-week-old Swiss nude mice as done previously”®. 
Brain metastasis tumours were established by ICA injection of tumour cells 
(250,000 cells in 0.1ml HBSS for MDA-MB-231, HCC1954, MDA-MB-435, 
4T1 and B16BL6, and 500,000 cells in 0.1 ml HBSS for BT474.m1 into the right 
common carotid artery as done previously”). Mice (6-8 weeks) were randomly 
grouped into designated groups. Female mice are used for breast cancer experi- 
ments, both female and male are used for melanoma experiments. Since the brain 
metastasis model does not result in visible tumour burdens in living animal, the 
endpoints of in vivo metastasis experiments are based on the presence of clinical 
signs of brain metastasis, including but not limited to, primary central nervous 
system disturbances, weight loss, and behavioural abnormalities. Animals are 
culled after showing the above signs or 1-2 weeks after surgery based on specific 
experimental designs. Brain metastasis lesions are enumerated as experimental 
readout. Brain metastases were counted as micromets and macromets. The def- 
inition of micromets and macromets are based on a comprehensive mouse and 
human comparison study previously published”. In brief, ten haematoxylin and 
eosin (H&E)-stained serial sagittal sections (300 um per section) through the left 
hemisphere of the brain were analysed for the presence of metastatic lesions. We 
counted micrometastases (that is, those < 50 um in diameter) to a maximum of 
300 micrometastases per section, and every large metastasis (that is, those > 50 um 
in diameter) in each section. Brain-seeking cells from overt metastases and whole 
brains were dissected and disaggregated in DMEM/F-12 medium using Tenbroeck 
homogenizer briefly. Dissociated cell mixtures were plated on tissue culture dish. 
Two weeks later, tumours cells recovered from brain tissue were collected and 
expanded as brain-seeking sublines (Br.1). For the astrocyte miR-19 knockout 
mouse model, Mircl'”"'!™/7 mice (Jax lab) (6-8 weeks) were intracranially 
injected with Ad5-GFAP-Cre virus (Iowa University, Gene Transfer Vector 
Core) 2 pl (MOI ~10° U jl’) per point, total four points at the right hemisphere 
(n = 9). Control group (” = 7) was injected with the same dose Ad5-RSV-BGLuc 
(Ad-BGLuc) at the right hemisphere. All intracranial injections were performed by 
an implantable guide-screw system. One week after virus injection, mice were 
intracarotidly injected with 2 < 10° B16BL6 tumour cells. After two weeks, whole 
brains were dissected and fixed in 4% formaldehyde, and embedded in paraffin. 
Tumour formation, histological phenotypes of H&E-stained sections, and IHC 
staining were evaluated. Only parenchymal lesions, which are in close proximity of 
adenovirus injection, were included in our evaluation. Tumour size was calculated 
as (longest diameter) X (shortest diameter)*/2. For the intracranial tumour model, 
Mirc1"™!!™] mice (Jax lab) (6-8 weeks) were intracranially injected as described 
above. Seven mice were used in the experiment. One week later, these mice were 
intracranially injected with 2.5 X 10° B16BL6 tumour cells at both sides where 
adenoviruses were injected. After another week, whole brains were dissected and 


fixed in 4% formaldehyde, and embedded in paraffin. Tumour formation and 
phenotype were analysed as above. 

For the Rab27a/b knockdown mouse model, seven C57BL6 mice (Jax lab) 
(6-8 weeks) were intracranially injected with concentrated lentivirus containing 
shRab27a and shRab27b (ratio 1:2) 2 ul per point, total three points at the right 
hemisphere; concentrated control lentivirus containing pLKO.1 scramble were 
injected at the left hemisphere. All intracranial injections were performed by an 
implantable guide-screw system. One week later, mice were intracranially injected 
with 5 X 10* B16BL6 tumour cells at both sides where they had been infected. 
After one week, whole brains were dissected and fixed in 4% formaldehyde, and 
embedded in paraffin. Tumour formation, histological phenotypes of H&E- 
stained sections, IHC staining were evaluated. When performing metastases size 
quantification, only parenchymal lesions that were in close proximity to the adeno- 
virus injection sites were included in the analyses. Tumour size was calculated as 
(longest diameter) X (shortest diameter)?/2. For exosome rescue experiments, 
eight C57BL6 mice (Jax lab) (6-8 weeks) were intracranially injected with concen- 
trated lentivirus containing shRab27a and shRab27b (ratio 1:2) 2 ul per point, total 
3 points at both hemispheres. One week later, these mice were intracranially 
injected with 5 X 10* B16BL6 tumour cells with 10g exosome isolated from 
astrocyte media at the right sides where they had been injected with lentivirus; 
5 X 10* B16BL6 tumour cells with vehicle were injected at the left sides where 
lentivirus had been injected. After another week, whole brains were dissected 
and fixed in 4% formaldehyde, and embedded in paraffin. Tumour formation 
and phenotype were analysed as above. 

For in vivo extravasation assay, equal numbers of cells labelled with GFP-control 
shRNA and RFP-PTEN shRNA (Open Biosystems) were mixed and ICA injected. 
After cardiac perfusion, brains were collected and sectioned through coronal plan 
on a vibrotome (Leica) into 50-t1m slices. Fluorescent cells were then counted. For 
inducible PTEN expression in vivo, mice were given doxycycline (10 tig kg_') every 
other day. To quantify brain metastasis incidence and tumour size, brains were 
excised for imaging and histological examination at the end of experiments. Ten 
serial sagittal sections every 300 jum throughout the brain were analysed by at least 
two pathologists who were blinded to animal groups in all above analyses. 
Reverse-phase protein array. Reverse-phase protein array of PTEN-overexpres- 
sing cells was performed in the MDACC Functional Proteomics core facility. In 
brief, cellular proteins were denatured by 1% SDS, serial diluted and spotted on 
nitrocellulose-coated slides. Each slide was probed with a validated primary anti- 
body plus a biotin-conjugated secondary antibody. The signal obtained was amp- 
lified using a Dako Cytomation-catalysed system and visualized by DAB 
colorimetric reaction. The slides were analysed using customized Microvigene 
software (VigeneTech Inc.). Each dilution curve was fitted with a logistic model 
(‘Super curve fitting’ developed at the MDACC) and normalized by median polish. 
Differential intensity of normalized log values of each antibody between RFP 
(control) and PTEN-overexpressed cells were compared in GenePattern (http:// 
genepattern.broadinstitute.org). Antibodies with differential expression (P < 0.2) 
were selected for clustering and heat-map analysis. The data clustering was per- 
formed using GenePattern. 

Patient samples. Two studies in separate cohorts were conducted. The first one 
was a retrospective evaluation of PTEN in two cohorts. (1) Archived formalin- 
fixed and paraffin-embedded brain metastasis specimens (n = 131) from patients 
with a history of breast cancer who presented with metastasis to the brain par- 
enchyma and had surgery at the MDACC (Supplementary Information). Tissues 
were collected under a protocol (LAB 02-486) approved by the Institutional 
Review Board (IRB) at the MDACC. (2) Archived unpaired primary breast cancer 
formalin-fixed and paraffin-embedded specimens (n = 139) collected under an 
IRB protocol (LAB 02-312) at the MDACC (Supplementary information). Formal 
consent was obtained from all patients. The second study was a retrospective 
evaluation of PTEN, CCL2 and IBA1 in the matched primary breast tumours 
and brain metastatic samples from 35 patients, of which there are 12 HER2- 
positive, 14 triple-negative and nine oestrogen-receptor-positive tumours accord- 
ing to clinical diagnostic criteria (Supplementary Information). Formalin-fixed, 
paraffin-embedded primary breast and metastatic brain tumour samples were 
obtained from the Pathology Department, University of Queensland Centre for 
Clinical Research. Tissues were collected with approval by human research ethics 
committees at the Royal Brisbane and Women’s Hospital (2005/022) and the 
University of Queensland (2005000785). For tissue microarray construction, 
tumour-rich regions (guided by histological review) from each case were sampled 
using 1-mm cores. All the archival paraffin-embedded tumour samples were 
coded with no patient identifiers. 

IHC and immunofluorescence. Standard IHC staining was performed as 
described previously”*. In brief, after de-paraffinization and rehydration, 4 um sec- 
tions were subjected to heat-induced epitope retrieval (0.01 M citrate for PTEN). 
Slides were then incubated with various primary antibodies at 4°C overnight, after 
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blocking with 1% goat serum. Slides underwent colour development with DAB and 
haematoxylin counterstaining. Ten visual fields from different areas of each tumour 
were evaluated by two pathologists independently (blinded to experiment groups). 
Positive IBA1 and Ki-67 staining in mouse tumours were calculated as the percent- 
age of positive cells per field (%) and normalized by the total cancer cell number in 
each field. TUNEL staining was counted as the average number of positive cells per 
field (10 random fields). We excluded necrotic areas in the tumours from evaluation. 
Immunofluorescence was performed following the standard protocol recommended 
by Cell Signaling. In brief, after washing with PBS twice, cells were fixed with 4% 
formaldehyde. Samples were blocked with 5% normal goat serum in PBS for 1h 
before incubation with a primary antibody cocktail overnight at 4 °C, washed, then 
incubated with secondary antibodies before examination using confocal microscope. 
Pathologists were blinded to the group allocation during the experiment and when 
assessing the outcome. 

Bioinformatics and statistical analysis. Publicly available GEO data sets 
GSE14020, GSE19184, GSE2603, GSE2034 and GSE12276 were used for bioinfor- 
matics analysis. The top 2X 10* verified probes were subjected to analysis. 
Differentially expressed genes between metastases from brain and other sites 
(primary or other metastatic organ sites) were analysed by SAM analysis in R 
statistical software. The 54 commonly downregulated genes in brain metastases 
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from GSE14020 and GSE19184 were depicted as a heat-map by Java Treeview. For 
staining of patient samples, we calculated the correlation by Fisher’s exact test. For 
survival analysis of GSE2603, the patient samples were mathematically separated 
into PTEN-low and -normal groups based on K-means (K = 2). Kaplan-Meier 
survival curves were generated by survival package in R. Multiple group IHC 
scores were compared by Chi-square test and Mantelhaen test in R. All quantitat- 
ive experiments have been repeated using at least three independent biological 
repeats and are presented as mean + s.e.m. or mean + s.d.. Quantitative data were 
analysed either by one-way analysis of variance (ANOVA) (multiple groups) or 
t-test (two groups). P < 0.05 (two-sided) was considered statistically significant. 
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Extended Data Figure 1 | Organ-specific loss of PTEN in brain metastases. 
a, Schematics of microarray analyses. Patients’ brain metastases exhibited a 
discrete gene expression profile with 650 genes significantly downregulated 
compared to bone or lung metastases (GSE14020). Cancer cells were 

injected into immunodeficient mice to produce orthotopic primary tumours 
(MDA-MB-231 cells for mammary tumour, PC14 for prostate tumour, 
A375SM for melanoma) and experimental brain metastases (all three lines). 
Brain metastases derived from these three cancer cell lines exhibited 2,161 
commonly downregulated genes compared to their respective primary 
tumours (GSE19184). PTEN is one of only 54 commonly downregulated 
genes in brain metastases of both data sets. b, Heat-maps showing expression 
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of 54 commonly downregulated genes (see a) in clinical brain metastases versus 
lung metastases and bone metastases. c, Heat-maps showing expression of 
the 54 genes (see a) in cell-line-induced primary tumours versus experimental 
brain metastases. d, Kaplan-Meier survival analyses showing no significant 
differences in brain metastasis-free survival between breast cancer patients 
with primary tumours expressing normal PTEN or low PTEN mRNA in GEO 
cDNA microarray set GSE2603 (P = 0.74). e, PTEN mRNA levels detected 

in primary breast tumours from patients with or without brain metastasis 
relapse. Three GEO cDNA microarray data sets (GSE2034, GSE2603 and 
GSE12276) with clinical annotation were analysed. Relative PTEN expression 
levels were compared by f-test. 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


MDA-MB-231 HCC1954 BT474 MDA-MB-435 
Cancer Cell Lines Ik 4 7 a 
<a Cell pellet wt 
ICA 
Brain mets 
MFP | 
1s 
— MFP Primary 
Set tumor 
| 
b Cancer Cell Lines MDA-MB-231 ATI c 
ca STR 
Brain mets GSE12237 
Ich ea < |MDA-MB-231 _CN-34 
Ses 
a 
E O57 
=o . oe : 
ig ———» Lung mets Wi oa" Seer eee » « 
a 
MFP | = x34 . 
ri | S 24 = Parental 
~\ MFP Primary aa = Brain-seeking 
lf — ee 
: tumor In vitro cultured cell lines 
Tai\vein 
d 
e 
In vivo tumor 
P<0.01 
Ss 30 Hi /n vitro tumor-derived cell line $ 
cL a 
a ES 
x 2.0 > ec 
Zz Ww 2 
a i 3 
E 40 oS 05 
=> : Zu 
a a7 
rs © 
a 0.0 ais ak 
, N N 2° MEP tumor 2° Brain mets 03. 05 Po 
& & & CA CE oo 
ee? S MDA-MB-231Br FFF FO 
OE TN SF LS Le 
@ SS uy & * 
o ¢ . 
v P 
x 
be be be 
of & of 
: g LS Co? LF W 
& : 
N a N Methylation 
ar af Ss af o promoter 
e 4 cs os Unmethylation ‘sis P=0.5271 
© Oras OS promoter : 
“s SS Ss15 
6s ns. MB HCC 1954 only 5 
PTEN | << <r = GB HCC1954 + Glia S10 
= HCC1954 + CAF ag 
S 60 mB 05 
1 0.871 0.433 D a 8 
' & 40 [ 5 a 0.0 
B-actin s +CAF +Glia 
5 20 HCC1954 
a 


Methylated 


Unmethylated 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 2 | PTEN expression in different metastatic organ 
microenvironments and in vitro culture condition. a, Breast cancer cell lines 
(MDA-MB-231, HCC1954, BT474, and MDA-MB-435) were cultured and 
injected either to the MFP to form primary tumour or intracarotidly to form 
brain metastases. Cell pellets and tumour tissues were stained for PTEN 
expression using anti-PTEN antibodies as described previously*. b, IHC 
staining of PTEN in brain metastases, paired lung metastases and primary 
tumour derived from either MDA-MB-231 or 4T1 cells. PTEN expression level 
was analysed based on an IRS scoring system. c, PTEN mRNA levels between 
parental MDA-MB-231 and CN-34 breast cancer cell lines (blue) and their 
brain-seeking sublines (red). Normalized PTEN-specific probe intensity values 
were extracted from cDNA microarray data set GSE12237. Dot plot shows 
the mean probe intensity derived from independent RNA samples. d, PTEN 


qRT-PCR (mean = s.e.m., t-test) and PTEN IHC in MDA-MB-231Br 
secondary tumours and cultured cells (3 biological replicates, with 3 technical 
replicates each). e, f, RT-PCR (e) and western blot (f) analysis of PTEN 
mRNA expression (mean + s.e.m., f-test) or protein expression in MDA-MB- 
231 cells after co-culture with either primary mouse CAFs isolated from MDA- 
MB-231 xenograft tumours or primary mouse glia isolated from mouse 
brain (3 biological replicates, with 3 technical replicates each). g, Representative 
methylation-specific PCR of PTEN promoter and quantification under 
co-culture with glia or CAF (mean + s.e.m., t-test, 2 biological replicates, with 
2 technical replicates each). h, PTEN promoter activity measured by luciferase 
reporter in HCC1954 cells after co-culture with either CAF or glia cells for 
48 h (mean = s.e.m., t-test, P = 0.5271, 3 biological replicates, with 3 technical 
replicates each). 
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Extended Data Figure 3 | Cre-mediated depletion of PTEN-targeting 
microRNAs in astrocytes. a, IHC analyses of the expression of AMP-activated 
protein kinase (AMPK), pro-apoptotic protein BIM and transcription factor 
E2F1 (mean = s.d., t-test) in brain metastasis tumours with/without pre- 
knocking out the miR-17~92 cluster in the brain microenvironment. B, brain 
tissue; M, brain metastases. b, Schematic of experimental design. The Ad- 
GFAP-Cre adenovirus was injected intracranially to the right hemisphere of the 
Mirc1"”"'!™i7] mouse, and the control adenovirus (Ad-BGLuc) was injected 
intracranially to contralateral side of the brain. B16BL6 cells were then injected 
intracranially to both sides. c, IHC analysis of Cre expression in the brain 
astrocytes. d, IHC analysis of PTEN expression in the tumour cells. 

e, Quantification of PTEN expression in tumour cells (mean = s.d., t-test). 

f, Quantification of intracranial tumour outgrowth by volume (mean = s.e.m., 
t-test). g, RT-PCR analyses of miR-19a and PTEN mRNA in tumour cell 


HCC1954 after 48 h co-culture with primary astrocytes from Mire1"""1?/7 
mice pre-infected (48 h) with adenovirus (Ad-BGLuc or Ad-GFP-Cre) 

(mean + s.e.m., t-test, 3 biological replicates, with 3 technical replicates each). 
h, Western blot of PTEN protein in the indicated tumour cells co-cultured as in 
g. i, Knockdown of miR-17~92 allele in cultured primary astrocytes. miR- 
17~92 cluster is flanked by loxP site in Mirc1’"""'™/J mouse. Primary 
astrocytes were isolated from Mirc1’”"''™/] mouse brain then infected by 
adenovirus encoding for BGLuc or GFP-Cre protein. Concentrated adenovirus 
particles of indicated volume (same MOI ~10°U ml!) encoding BGLuc or 
GFP-Cre proteins were added to 10° astrocytes. Left, representative image 
showing the infection efficiency. Right, bar diagram showing the relative miR- 
19a expression (one of the five miRNA genes in the miR-17~92 cluster) three 
days after adenovirus infection (mean ~ s.e.m., t-test, 3 biological replicates, 
with 3 technical replicates each). 
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Extended Data Figure 4 | Contact-independent downregulation of PTEN either astrocytes or CAFs for 60h. e, Flow cytometry detecting CD63* 


in tumour cells by miR-19a from astrocyte-derived exosomes. a, Flow exosomes extracted from CAF- or astrocyte-conditioned media. f, Histogram 
cytometric detection of Cy3-miR-19a and FITC-EpCAM in tumour cells showing the exosome protein level detected from CAF- and astrocyte- 
60h after co-culture with Cy3-miR-19a-transfected astrocytes and CAFs. conditioned media normalized by cell number (mean + s.e.m., t-test, 


b, c, Tumour cells were co-cultured with conditioned media from astrocytes or P< 0.0001, 3 biological replicates, with 3 technical replicates each). g, RT-PCR 
CAFs for 60 h. RT-PCR analyses of the PTEN-targeting miR-19a level (b) and _ analyses of miR-19a level in exosomes extracted from CAF- or astrocyte- 
PTEN mRNA level (c) in tumour cells (mean + s.e.m., t-test, 3 biological conditioned media normalized by equal cell numbers (mean + s.e.m., f-test, 
replicates, with 3 technical replicates each). d, Western blot detecting PTEN P<0.0001, 3 biological replicates, with 3 technical replicates each). 

protein levels in HCC1954 cells after culture with conditioned media from 
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Extended Data Figure 5 | Inhibition of exosome release by DMA, Rab27a 
siRNA or Rab27 shRNAs. a, Exosome-releasing inhibitor (DMA) treatment 
reduced exosome secretion from astrocytes compared to vehicle treated 
astrocytes. Astrocytes were treated with DMA (25 pg ml) or vehicle for 4h; 
exosomes were concentrated from astrocyte-conditioned media and total 
proteins from exosomes were examined by BCA assay (normalized to total cell 
numbers) (mean = s.e.m., t-test, P = 0.038, 3 biological replicates, with 

3 technical replicates each). b, Knockdown of Rab27a in astrocytes by siRNA. 
Two siRNAs targeting mouse Rab27a were transiently transfected into 
astrocytes, and the Rab27a mRNA level was examined by RT-PCR 48 h after 
transfection (mean + s.e.m., t-test, P< 0.01, 3 biological replicates, with 

3 technical replicates each). c, Knocking down Rab27a in astrocytes inhibited 
exosome release. Forty-eight hours after Rab27a-targeting siRNAs were 
transfected, exosomes were collected from astrocyte-conditioned media and 
total proteins from exosomes were examined by BCA assay (normalized to 
total cell numbers) (mean + s.e.m., t-test, 3 biological replicates, with 3 


technical replicates each). d, Histogram showing relevant changes of Rab27a 
and Rab27b mRNA level in primary astrocytes infected with pLKO.shRab27a 
or pLKO.shRab27b virus (mean + s.e.m., t-test, P < 0.001, 3 biological 
replicates, with 3 technical replicates each). e, Change of exosome protein level 
detected in the conditioned media from astrocytes infected by pLKO.shRab27a 
or pLKO.shRab27b virus by BCA assay (normalized to total cell numbers) 
(mean + s.e.m., t-test, P< 0.001, 3 biological replicates, with 3 technical 
replicates each). f, g, IHC analysis showing the expression level of Rab27a and 
Rab27b (f) and exosome marker expression CD63 (g) in the brain tissue derived 
from mice injected with control lentivirus or Rab27a/b shRNA lentiviruses and 
subsequently intracranially injected with B16BL6 cells. h, i, IHC analysis 
showing the expression level of Rab27a and Rab27b (h) and exosome marker 
expression CD63 (i) in the brain tissue derived from mice injected with Rab27a/b 
shRNA lentiviruses and subsequently intracranially injected with B16BL6 cells 
and vehicle at the left side or B16BL6 cells and astrocyte-derived exosomes at the 
right side. 
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Extended Data Figure 6 | Brain extravasation of MDA-MB-231 parental 
cells with or without induction of doxycycline-inducible PTEN shRNA 
knockdown, PTEN expression and CCL2 shRNA knockdown. a, Western 
blot showing PTEN expression levels after treating MDA-MD-231 cells 

with doxycycline. MDA-MD-231 cells were stably infected with inducible 
shRNA expression vectors (pTRIPZ-control-shGFP as control and pTRIPZ- 
shRNA-RFP for PTEN shRNA). Doxycycline (1 fig ml’) was added to induce 
shRNA expression for 5 days. As indicated, doxycycline was withdrawn in 
some samples for another 5 days before analysis. b, Schematics of in vivo 
extravasation assay. shControl-GFP and shPTEN-RFP cells were mixed at a 1:1 
ratio. In total, 200,000 cells were ICA injected into mice, and doxycycline 

(50 tg kg” *) was given intraperitoneally daily. Brains were collected 5 days after 
ICA injection. ¢, Dot plot of extravasated cell counts 5 days after ICA injection 
of indicated MDA-MB-231 sublines. Tumour-bearing brains were collected 
and sectioned into 100 um coronal slices. Extravasated tumour cells were 
counted under the fluorescence microscope (mean + s.d., t-test). d, MDA-MB- 
231Br single cells were expanded into subclones (C12, C14, C18 and C19), 
which were transfected with doxycycline-inducible pTRIPZ-RFP or pTRIPZ- 
PTEN. 48h after doxycycline (1 1g ml‘) treatment, PTEN induction was 
tested by western blotting. The C14 clone was used for further in vivo assays 


shControl shCCL2 


MDA-MB-231Br 


(see e, f and Fig. 4a). e, IHC staining of induced PTEN expression in brain 
metastases derived from mice injected with MDA-MB-231Br (231Br-RFP 

or 231Br-PTEN) cells. f, IHC analysis of PTEN downstream signalling 
pathway, including phosphorylated pAkt(T308), pAkt($473) and 
pP70S6K(T389+T412) in brain metastases from mice injected with 231Br- 
RFP or 231Br-PTEN cells. Top, dot plot of IHC data quantification by IRS 
(mean + s.d., t-test); bottom, representative IHC staining data. g, Histograms 
of PTEN and CCL2 mRNA levels (mean = s.e.m., t-test) in indicated cancer cell 
lines 48 h after transfection with control or PTEN siRNAs (3 biological 
replicates, with 3 technical replicates each). h, Histogram showing the inducible 
CCL2 knockdown. MDA-MB-231Br cells were stably infected with pTRIPZ- 
inducible CCL2 shRNAs. 48h after doxycycline (1 1g ml’) treatment, CCL2 
mRNA was examined by RT-PCR (mean = s.e.m., f-test, 3 biological 
replicates, with 3 technical replicates each). i, Doxycycline-induced CCL2 
knockdown in brain metastases. Mice were ICA injected with MDA-MB-231Br 
cells containing control or CCL2 shRNAs. Doxycycline (50 jg kg") was given 
to mice intraperitoneally daily after injection. IHC staining of CCL2 expression 
levels in brain metastases derived from MDA-MB-231Br cells. T, brain 
metastasis tumours at day 30 after ICA injection. 
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Extended Data Figure 7 | PTEN-regulated CCL2 expression through the 
NE-«B pathway. a, Heat-map showing differentially expressed protein 
markers of reverse-phase protein array analysis. MDA-MB-231Br cells were 
stably infected with pTRIPZ-RFP or pTRIPZ-PTEN (231Br-RFP or 231Br- 
PTEN) and induced by doxycycline (1 jg ml~') for 48 h. b, Box chart showing 
the absolute intensity of PTEN and NF-«B p65(S536). c, Western blot 
analysis of NF-«B p65 nuclear translocation, after cells were treated with Akt 
inhibitor MK2206 (10 pg ml ') 24h before separation into cytosolic (Cyto) 
and nuclear (Nuc) fractions. d, Western blot analysis of NF-«B p65 nuclear 


translocation, after cells were treated with NF-«B inhibitor PDTC (0.2 mM) 
16h before separation into cytosolic and nuclear fractions. e, Relative CCL2 
mRNA expression after NF-«B inhibitor PDTC treatment analysed by qRT- 
PCR (mean + s.e.m., t-test, 3 biological replicates, with 3 technical replicates 
each). Cells were treated with PDTC (0.2 mM) for 16h. f, Relative CCL2 protein 
expression after PDTC treatment analysed by ELISA (mean + s.e.m., f-test, 

3 biological replicates, with 3 technical replicates each). Cells were treated 

as ine. 
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Extended Data Figure 8 | CCR2-mediated IBA1* myeloid cell directional 
migration. a, Co-expression of IBA] and CCR2 on myeloid cells freshly 
isolated from mouse brain by CD11b beads. Representative immunofluo- 
rescence staining of IBA1 (left). FACS analysis of CD11b* cells for CCR2 
expression. b, Relative CCR2 expression in the BV2 microglia cell line 
compared with NIH3T3 fibroblasts. CCR2 mRNA level analysed by (RT-PCR 
(mean + s.e.m., t-test, 3 biological replicates, with 3 technical replicates each) 
(left) and protein expression analysed by FACS (right) c, Transwell 
migration assay examining the directional migration of BV2 cells towards 


expression (Fold) 
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CCL2. In total, 10° BV2 cells were seeded in the top chamber of the transwell 
units, and CCL2 or BSA (20 ng ml _') was added into serum-free media in 
the bottom chamber. The migrated cell numbers were counted at 24 h. Next, 
CCR2 antagonists with different concentrations (10 1M, 1 1M and 0.1 1M) 
were added into the top chamber with BV2 cells, and CCL2 (20 ng ml _') was 
added into serum-free media in the bottom chamber. The migrated cell 
numbers were counted at 24h. d, Quantification of BV2 cell migration assay 
(mean + s.e.m., t-test, 3 biological replicates, with 3 technical replicates each). 
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Extended Data Figure 9 | The association between PTEN, CCL2 expression 
and recruitment of IBA1* myeloid cells in patients’ brain metastases and 
matched primary breast tumours. a, Summary histogram of CCL2 protein 
levels in primary breast tumours and matched brain metastases from 35 
patients. Chi-square test was used to compare the IHC score in primary breast 
tumours versus matched brain metastases. P < 0.05 is defined as significantly 
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different. b, Tables showing IHC scores of PTEN and CCL2 expression in 
primary breast tumours and matched brain metastases. c, Representative 
IHC staining of CCL2 proteins and IBA1* myeloid cells in patients’ brain 
metastases, and the correlation plot showing the Pearson correlation between 
CCL2 and IBA1 staining in patients’ brain metastases (R = 0.371, P = 0.028). 


©2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


PTEN level 


IBA1+ myeloid cell 
recruitment 


Proliferation 4 
Apoptosis J 


Extended Data Figure 10 | PTEN loss induced by astrocyte-derived 
exosomal microRNA primes brain metastasis outgrowth via functional 
cross-talk between disseminated tumour cells and brain metastatic 


microenvironment. Top, disseminated tumour cells extravasate into the brain. 


a-c, Exosomes secreted by astrocytes in the brain microenvironment transfer 
PTEN-targeting miRNA into extravasated brain metastatic tumour cells, 


leading to PTEN downregulation in tumour cells. c, d, PTEN loss in brain 
metastatic tumour cells increases their CCL2 secretion, facilitating the 
recruitment of IBA1*/CCR2* myeloid cells at the micrometastasis site. 

d, e, The recruited IBA1~ myeloid cells enhance proliferation and inhibit 
apoptosis of metastatic tumour cells, and promote metastatic outgrowth. 
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Autophagy mediates degradation of nuclear lamina 


Zhixun Dou!, Caiyue Xu!, Greg Donahuel, Takeshi Shimi?, Ji- An Pan’, Jiajun Zhu', Andrejs Ivanov‘, Brian C. Capell!, 
Adam M. Drakel, Parisha P. Shah!, Joseph M. Catanzaro*, M. Daniel Ricketts°, Trond Lamark®, Stephen A. Adam’, 
Ronen Marmorstein®”*, Wei-Xing Zong’, Terje Johansen®, Robert D. Goldman’, Peter D. Adams‘ & Shelley L. Berger! 


Macroautophagy (hereafter referred to as autophagy) is a cata- 
bolic membrane trafficking process that degrades a variety of 
cellular constituents and is associated with human diseases!-3. 
Although extensive studies have focused on autophagic turnover of 
cytoplasmic materials, little is known about the role of autophagy in 
degrading nuclear components. Here we report that the autophagy 
machinery mediates degradation of nuclear lamina components 
in mammals. The autophagy protein LC3/Atg8, which is involved 
in autophagy membrane trafficking and substrate delivery*°, 
is present in the nucleus and directly interacts with the nuclear 
lamina protein lamin B1, and binds to lamin-associated domains on 
chromatin. This LC3-lamin B1 interaction does not downregulate 
lamin B1 during starvation, but mediates its degradation upon 
oncogenic insults, such as by activated RAS. Lamin B1 degradation 
is achieved by nucleus-to-cytoplasm transport that delivers lamin 
B1 to the lysosome. Inhibiting autophagy or the LC3-lamin B1 
interaction prevents activated RAS-induced lamin B1 loss and 
attenuates oncogene-induced senescence in primary human cells. 
Our study suggests that this new function of autophagy acts as a 
guarding mechanism protecting cells from tumorigenesis. 

Several mammalian autophagy proteins are present in the nucleus, 
including LC3 (refs 7, 8), Atg5 (ref. 9), and Atg7 (ref. 10). However, 
whether nuclear LC3 is involved in degrading nuclear components 
is not understood. We investigated LC3 distribution by subcellular 
fractionation of primary human IMR90 cells and found a substantial 
amount of endogenous LC3 and a small amount of lipidated LC3-II 
in the nucleus (Fig. 1a). We used bacterially purified glutathione 
S-transferase (GST)-LC3B (hereafter ‘LC3’ unless specified otherwise) 
to pull down the nuclear fraction (Fig. 1b). One protein that we found 
to interact with LC3 is the nuclear lamina protein lamin B1 (Fig. 1b). 
The nuclear lamina is a fibrillar network located beneath the nuclear 
envelope whose major components are the four nuclear lamin iso- 
forms: lamins B1, B2, and A/C, and their associated proteins". Nuclear 
lamina provides the nucleus with mechanical strength and regulates 
higher-order chromatin organization, modulating gene expression and 
silencing! In contrast to lamin B1, lamins A/C and lamin B2 bind 
poorly, if at all, to LC3 (Fig. 1b). We detected a direct interaction of 
purified lamin B1 (Extended Data Fig. 1a) with LC3B (Fig. 1c) and 
other members of the Atg8 protein family, including LC3A, LC3C, 
and GABARAP (Extended Data Fig. 1b, c). Co-immunoprecipitation 
(co-IP) revealed that LC3-lamin B1 interaction occurs at the endog- 
enous level in the nucleus (Fig. 1d, e and Extended Data Fig. 1d). 
Lipidated LC3-II is involved in mediating lamin B1 interaction (Fig. 1d 
and Extended Data Fig. le-g), and the LC3 G120A lipidation-deficient 
mutant showed impaired binding to lamin B1 (Fig. 1f). A bimolecular 
fluorescence complementation (BiFC) assay’? showed that LC3-lamin 
B1 interaction happens at the nuclear lamina and is dependent on LC3 


lipidation (Extended Data Fig. 1h-j). Together, these data suggest that 
LC3 directly interacts with lamin B1, and that LC3 lipidation facilitates 
this interaction, possibly by tethering LC3 to the inner nuclear mem- 
brane where the interaction with nuclear lamina occurs. 
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Figure 1 | LC3 interacts with nuclear lamina protein lamin B1. 

a, Proliferating young IMR90 cells were subjected to subcellular 
fractionation and immunoblotting. SE, short exposure; LE, long exposure. 
b, The nuclear fraction of IMR90 cells was pulled down with bacterially 
purified GST or GST-LC3. c, GST-LC3 pull-down of purified lamin B1 
protein. d, Endogenous immunoprecipitation in IMR90 cells. e, LC3 
immunoprecipitation of IMR90 fractions. f, HEK293T cells were transfected 
and subjected to GFP immunoprecipitation and immunoblotting. Bars, 
mean +s.e.m.; n= 3; *P<0.001. ** P< 0.0001; one-way analysis of variance 
(ANOVA) coupled with Tukey’s post hoc test (b); unpaired two-tailed 
Student's t-test (f). Uncropped blots are in Supplementary Figure 1. 
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Figure 2 | LC3 associates with LADs on chromatin. a, IMR90 cells stably 
expressing GFP-tagged constructs were subjected to GFP ChIP—quantitative 
polymerase chain reaction (qPCR). Uncropped blots are in Supplementary 
Figure 1. b, LC3 ChIP-qPCR. Bars, mean +s.e.m.;n=3; *P<0.05, 
**P<0.01, ***P< 0.005; NS, non-significant; unpaired two-tailed 
Student's t-test. c-e, ChIP-sequencing analyses in proliferating IMR90 cells. 


Lamin B1 associates with transcriptionally inactive heterochro- 
matin domains called LADs (lamin-associated domains)!"!3. We 
used chromatin immunoprecipitation (ChIP) to investigate the 
association of LC3 with LADs. ChIP of LC3 showed that in its lipi- 
dated form, LC3 associates with LADs but poorly with euchroma- 
tin regions, such as B-actin and PCNA promoters, similarly to that 
of lamin B1 (Fig. 2a, b and Extended Data Fig. 2a—c). We then per- 
formed endogenous lamin B1 and LC3 ChIP followed by genome- 
wide sequencing (ChIP-seq), done in two independent biological 
replicates, R1 and R2 (Fig. 2c for whole chromosome 3 and a zoom-in 
window in Extended Data Fig. 2d). We used enriched domain detec- 
tor (EDD), an algorithm that detects wide enrichment domains 
to define LADs and LC3-associated domains (LC3ADs) (Fig. 2c 
and Extended Data Fig. 2d, black rectangles beneath the tracks). 
Analyses of lamin B1 and LC3 ChIP-seq revealed high reproduci- 
bility between R1 and R2 over LADs and LC3ADs (Fig. 2d, top two 
panels, and Extended Data Fig. 2e, f); LADs defined here correlate 
well with previously identified LADs from lamin B1 ChIP-seq'*!° and 
DamID!? (Extended Data Fig. 2g). We further found that LADs and 
LC3ADs significantly overlap (Fig. 2d, bottom panel; permutation test 
P<0.001, 1,000 iterations). Comparing LADs with an equal number 
of size-matched and randomly selected non-LAD control regions, we 
observed that both lamin B1 and LC3 are strongly enriched in LADs, 
for both replicates (Fig. 2e; permutation test for LC3: P< 0.01, 100 iter- 
ations, for both replicates). A similar enrichment is also detected over 
LC3ADs (Extended Data Fig. 2h). As expected, Lys9 trimethylation 
on histone H3 (H3K9me3) is highly enriched in LADs (Fig. 2e, per- 
mutation test P< 0.01, 100 iterations), whereas H3K4me3 is relatively 
depleted (Fig. 2e, permutation test P= 1, 100 iterations). We also found 
that both lamin B1 and LC3 from our ChIP-seq are strongly enriched 
in LADs mapped by other published studies!3> relative to non-LAD 
control regions (Extended Data Fig. 2i), in line with our findings from 
Fig. 2e. Collectively, these results indicate that LC3 associates with 
LADs on chromatin at the genome-wide scale. 

Next, we examined the biological functions of this interaction, and 
found that neither starvation nor rapamycin treatment downregulates 
lamin B1 protein (Fig. 3a), suggesting that autophagy does not degrade 
lamin B1 during starvation. One scenario that involves lamin B1 loss 
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c, Representative tracks over the whole of chromosome 3, for both replicates. 
d, Overlap of LADs and LC3ADs between two replicates and lamin B1 and 
LC3. e, ChIP-seq enrichment over LADs (+) and randomly selected 
non-LADs control regions (— ). One-sided Wilcoxon test; 

*P<2.2x 10-1, P=1 for H3K4me3. 


is oncogenic insult, such as induced by oncogenic RAS!”~'°. In fact, 
most primary cells and tissues cope with oncogenic RAS activity by 
inducing cellular senescence, a stable cell-cycle arrest that serves as a 
potent tumour suppressive mechanism?°!. We and others have shown 
that lamin B1, but not lamins A/C or B2, is dramatically downregulated 
during oncogene-induced senescence’”~’’. Importantly, autophagy is 
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Figure 3 | Lamin B1 is an autophagy substrate in response to oncogene 
activation. a, b, Primary IMR90 cells were treated as indicated and subjected 
to immunoblotting. AA, amino acids. Uncropped blots are in Supplementary 
Figure 1. c, IMR90 cells stably expressing mCherry-GFP-lamin B1 and 
HRasV 12 were stained with LC3 and LAMP1 antibodies, and analysed 

by confocal or three-dimensional super-resolution microscopy. Scale bar, 
10um. d, mCherry-GFP-lamin B1 IMR90 cells were treated as indicated. 
Bars, mean +s.d.; n= 4; *P<0.01, **P<0.001; one-way ANOVA coupled 
with Tukey’s post hoc test. e, Immuno-TEM analysis of IMR90 cells stably 
expressing GFP-lamin B1 and HRasV12. Gold nanoparticles are indicated by 
arrows and highlighted on the right. Scale bar, 500 nm. 
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Figure 4 | Inhibiting autophagy or the LC3-lamin B1 interaction impairs 
lamin B1 degradation. a, ER: HRasV12 IMR90 cells stably expressing 
non-targeting control (sh-NTC) or sh-Atg7 hairpin were induced by 

OHT (4-hydroxytamoxifen) and analysed by immunoblotting. b, Purified 
lamin B1 protein was subjected to pull-down of GST-LC3B wild type or 
mutants. c, Schematic illustration of lamin B1 mutants in binding to LC3. 

d, Regions from lamin A, B1, and B2 were subjected to GST-LC3B pull- 
down. e, HEK293T transfected were subjected to LC3 immunoprecipitation. 
f, ER:HRasV12 IMR90 cells were induced by OHT and analysed by 
immunoblotting. Uncropped blots are in Supplementary Figure 1. 


upregulated during oncogene-induced senescence, and is required for 
the mitosis-to-senescence transition’””*. We thus hypothesized that 
activated oncogenes trigger autophagic degradation of lamin B1 in 
primary human cells. 

Consistent with previous findings'”"’, primary, but not immortal- 
ized, human cells show downregulation of lamin B1 but not other lamin 
isoforms (Fig. 3b and Extended Data Fig. 3a). Although starvation does 
not alter lamin B1 nuclear lamina localization, HRasV12 expression 
induces nuclear membrane blebbing and cytoplasmic lamin B1 signals 
(Extended Data Fig. 3b). Transmission electron microscopy (TEM) 
analysis of HRasV 12-expressing cells confirmed the induction of auto- 
phagosomes, reduction of perinuclear heterochromatin, and induction 
of nuclear membrane blebs (Extended Data Fig. 3c—e). Unlike yeast 
piecemeal microautophagy, in which nuclear blebs directly contact 
cytoplasmic autophagic vacuoles”, the nuclear blebs in human senes- 
cent cells are morphologically distinct and do not directly contact these 
vacuoles (Extended Data Fig. 3c-e). 

We further used an mCherry-GFP-lamin B1 construct to inves- 
tigate the hypothesis that lamin B1 is degraded by the autophagy- 
lysosome pathway. Here, a yellow signal (due to merged mCherry 
and GFP) indicates that the fusion protein is in a neutral pH environ- 
ment, whereas a red signal (due to quenching of GFP) indicates that 
the protein has entered acidic lysosomes”*”®. mCherry-GFP-lamin B1 
showed a merged yellow nuclear peripheral pattern in control cells, 
but displayed cytoplasmic red-only bodies in HRasV12-expressing 
cells (Extended Data Fig. 3f). Inhibiting lysosomal acidification by 
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bafilomycin Al prevents GFP quenching and results in merged yellow 
signals in the cytoplasm (Extended Data Fig. 3g). Furthermore, we 
co-stained with antibodies against LC3 and LAMP1, and found that the 
cytoplasmic mCherry-only lamin B1 bodies stain positively for endog- 
enous LC3 and LAMP! (Fig. 3c). Super-resolution microscopy analysis 
revealed that the cytoplasmic lamin B1 and LC3 co-localizes within 
the LAMP 1-decorated vesicle (Fig. 3c and Extended Data Fig. 4a). 
Cytoplasmic lamin B1 and nuclear membrane blebs are specifically 
induced by HRasV 12, but not by starvation or rapamycin treatment 
(Fig. 3d). In addition, we performed live-cell imaging on mCherry- 
GFP-lamin B1l-expressing HRasV12 IMR90 cells, and confirmed a 
nucleus-to-cytoplasm transport process, through nuclear membrane 
blebbing, which then leads to lamin B1 degradation in the cytoplasm 
(Extended Data Fig. 4b). 

Cytoplasmic lamin B1 in HRasV12 cells is reminiscent of the cyto- 
plasmic chromatin fragments (CCF) that we previously described 
in senescent cells, which are fragments of heterochromatin budded 
off from the nuclei!®. Consistent with the behaviour of lamin B1, we 
found cytoplasmic DAPI (4’,6-diamidino-2-phenylindole) specifi- 
cally appearing in response to HRasV 12 (Fig. 3d). The cytoplasmic 
DAPI staining bodies are positive for H3K27me3 and H3K9me3, 
and co-localize with LC3 and lamin B1 (Extended Data Fig. 5a-c). 
Immuno-TEM analysis revealed that lamin B1 specifically localizes 
at the nuclear lamina in control cells (Extended Data Fig. 5d, left), 
whereas HRasV 12-expressing cells showed decreased presence of lamin 
B1 at the nuclear lamina, and the appearance inside autophagosomes 
and autolysosomes (Fig. 3e and Extended Data Fig. 5d, right). Taken 
together, these data indicate that lamin B1 is an autophagy substrate 
upon oncogenic insult, which, through a nucleus-to-cytoplasm trans- 
port process, leads to its autophagic degradation in the cytoplasm. 

We subsequently investigated the consequence of autophagy inhi- 
bition. Knockdown of Atg7 impairs the downregulation of lamin B1 
protein in HRasV 12 cells (Fig. 4a and Extended Data Fig. 6a). Lamin B1 
messenger RNA (mRNA) has been shown to decrease upon HRasV 12 
expression!”!8, Here the mRNA of lamin B1 is reduced both in con- 
trol and in Atg7 knockdown cells (Extended Data Fig. 6b), whereas 
the protein level of lamin B1 is maintained in Atg7-deficient cells 
(Fig. 4a). These data suggest that lamin B1 is downregulated both at 
mRNA and at protein levels, and are consistent with the observation 
that nuclear lamins are among the most long-lived proteins in cells’. 
Besides RAS-induced senescence, we found that Atg7 inhibition 
also attenuates lamin B1 loss triggered by oxidative stress and DNA 
damage-induced senescence (Extended Data Fig. 6c-e). Further, 
mCherry-GFP-lamin B1 expressed in Atg7 knockdown HRasV12 cells 
displayed normal induction of nuclear membrane blebs but deficient 
cytoplasmic mCherry signals (Extended Data Fig. 6f, g). These data 
suggest that inhibition of autophagy leads to a profound defect in the 
nucleus-to-cytoplasm transport of lamin B1. 

Lamin B1 plays an important role in cell proliferation and senes- 
cence’’. Forced knockdown of lamin B1 causes premature senes- 
cence!®!7, whereas overexpression of lamin B1 delays senescence!’. 
Restoration of lamin B1 in already-established senescent cells is not 
sufficient to revert senescence in vitro (Extended Data Fig. 6h, i). 
Consistent with the compromised lamin B1 degradation, we found that 
Atg7 knockdown cells showed delayed HRasV 12-induced senescence, 
as judged by reduced levels of p16 (Fig. 4a and Extended Data Fig. 6j) 
and delayed induction of senescence-associated }-galactosidase (f-gal) 
(Extended Data Fig. 6k). 

We mapped the LC3-lamin B1 interaction and discovered that 
LC3 R10 and R11 are essential for lamin B1 binding, from in vitro 
pull-down, in vivo co-IP, BiFC, and ChIP experiments (Fig. 4b and 
Extended Data Fig. 7a-f). Moreover, while LC3-wild type (WT) 
showed co-localization with CCF, the LC3 mutant failed to do so 
(Extended Data Fig. 7g). On the lamin B1 end, the region between 
Coil 2 and the immunoglobulin (Ig)-fold of lamin B1 is necessary for 
LC3 binding (Fig. 4c and Extended Data Fig. 8a—c). Notably, this region 


5 NOVEMBER 2015 | VOL 527 | NATURE | 107 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


b 
& 
Se 
ny ee 
Fr oc 
OS DO mCherry 
-GFP tagged Vector 


LB1 WT 


c d 


Vector |. 
Vector \ 


LB1 Mut 


3 of S 
2 ' oO 
HA tagged & PS & 


Lamin B1 +OHT (Day) 0 612200 612200 61220 LaminBi «> 
GSE ook ee +b WT 
_ @sT-ic3_+ + + + Ras =e eee wu e 
S = = _ 
A ; - - r 
8] IB:HA GFP (SE) LB1 
| Ponceau —— S70 308 
re S ae = Mut ies’ 
8 
GFP (LE) 2 10 NS 
e n a = 3 * 
e —_— 508; ~y 
Sor Fa pte (se) 5 
SE SE gos 
) ) 
cis PS p16 (LE) = - . S04 
STH & Ego Eo 
o e 9. go” 
ot =| a 3, Jo 
ry x ¥ ~ 
x RY yw SP OP 
7 WX LCOS 
e os x VS 2 
So 240 DQ 
= oO 
x FWP 
ra PO” 
2s sae h 
E100] oa 
3 ? © Vector Chromatin 
8 80 OLB1 370-458 Cytoplasmic chromatin 
s ALB1 370-458 Chromatin fragment 
gs Mut 
5 © 
3 
. +g. 
QO 20 40 60 80 100120140 160 
Time (days) (©) ox Ww 
f — 80 B-gal 9 aa CCF 
& E co 
o st Lamina . 
> > Lamina v 
% 40 ae * 
Q xx g 56 ee Autophagic 
3 7 Nuclear Nuclear degradation 
Oo 0 envelope envelope 
e PF . . & Fs < : : 
ws / > - 
x a a w x ES a ~ Basal state Activated oncogene 


Figure 5 | LC3-lamin B1 interaction is required for lamin B1 degradation 
and cellular senescence. a, In vitro translated proteins were subjected 

to GST-LC3B pull-down. b, BJ ER:HRasV 12 cells were analysed by 
immunoblotting. Uncropped blots are in Supplementary Figure 1. 

c, d, Colony formation analysis of BJ ER:HRasV12 cells. Asoonm, absorbance 
at 590nm. e, Mid-life BJ fibroblasts stably expressing mCherry-GFP-tagged 


(390-438) is the most evolutionarily conserved domain among all ver- 
tebrate lamin B1 (Extended Data Fig. 8d, e). The region, along with 20 
amino-acid flanking sequence at the amino and carboxy (N and C) 
termini (resulting in the fragment 370-458), is sufficient to bind LC3 
(Fig. 4c, d and Extended Data Fig. 8f), while the homologous regions on 
other lamins fail to bind LC3 (Fig. 4d). Examination of the amino-acid 
sequences revealed that lamins A/C harbour several distinct residues 
compared with lamin B1, and that lamin B2 has two insertions in the 
region (Extended Data Fig. 8d), which possibly alters the proper pep- 
tide folding for LC3 interaction. 

The 370-458 region of lamin B1 contains its nuclear localization 
signal (NLS) (Fig. 4c), hence the fragment localizes to the nucleus 
(Extended Data Fig. 8g) and is able to interact with endogenous 
LC3 (Fig. 4e). Overexpression of this fragment decreases endoge- 
nous LC3-lamin B1 interaction, but does not affect LC3 lipidation, 
LC3 binding to p62 (Fig. 4e), or p62 degradation upon starvation 
(Extended Data Fig. 8h). When expressed in HRasV 12 cells, the frag- 
ment impairs lamin B] downregulation, accompanied by an attenuated 
senescence (Fig. 4f and Extended Data Fig. 8i-k). 

We further identified the essential residues within lamin B1 for bind- 
ing to LC3, and found that simultaneously substituting the residues 
$393, $395, $396, R397, and V398 to alanine abrogates the interaction 
with LC3 (Fig. 5a and Extended Data Fig. 9a-g). In control cells, this 
lamin B1 substitution mutant shows a normal nuclear peripheral pattern 
(Extended Data Fig. 9h) and is able to interact with endogenous lamin A 
and lamin B1 (Extended Data Fig. 9j). However, in HRasV 12 cells, the 
mutant showed attenuated protein downregulation compared with WT 
lamin B1 (Fig. 5b and Extended Data Fig. 9k), and dramatically reduced 
cytoplasmic lamin B1 signals (Extended Data Fig. 9h, i), indicating that 
the mutant has a profound deficiency in nucleus-to-cytoplasm trans- 
port. Consequently, the lamin B1 mutant-expressing cells delayed 
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constructs were recorded for growth. Uncropped blots are in Supplementary 
Figure 1. f, Day 60, quantified for B-gal positivity. g, Day 101, quantified 

for cytoplasmic DAPI. Bars, mean +s.e.m. (c, d), s.d. (f, g);n=3 (c,d), 

n=4 (f, g); *P<0.05, **P<0.01, ***P<0.001, ****P< 0.0001; NS, 
non-significant; one-way ANOVA coupled with Tukey’s post hoc test. 

h, Schematic illustration of autophagy degradation of nuclear lamina. 


HRasV 12-induced senescence with a higher efficiency than WT lamin 
B1 (Fig. 5b and Extended Data Fig. 91), and significantly promoted the 
growth of colonies in colony-formation analysis (Fig. 5c). Furthermore, 
we used our lamin B1 370-458 peptide that blocks the LC3-lamin B1 
interaction and inhibits senescence (Fig. 4e, f). Introducing point muta- 
tions as mapped above (Fig. 5a) abrogates the peptide association with 
LC3 (Extended Data Fig. 10a). While the 370-458 peptide delayed cel- 
lular senescence induced by HRasV 12, the 370-458 mutant failed to 
do so (Fig. 5d and Extended Data Fig. 10b). Besides oncogene-induced 
senescence, the peptide also significantly delayed replicative senescence 
and the appearance of CCF (Fig. 5e-g and Extended Data Fig. 10c-e). 
Taken together, these data indicate that the LC3-lamin B1 interaction 
plays an essential role in reinforcing cellular senescence, which both 
suppresses oncogene activity and limits cellular lifespan. 

In this study, we discovered lamin B1 as a selective mammalian auto- 
phagy substrate upon oncogenic and genotoxic insults (illustrated in 
Fig. 5h). Recently, starvation-induced nuclear autophagy was discov- 
ered in yeast*®, which is devoid of nuclear lamina and malignancies. 
In contrast, we show that mammalian lamin B1 degradation does not 
occur during starvation. Recent studies reveal that downregulation of 
lamin B1 impairs cell proliferation and DNA repair!®!”?9°, and leads 
to large-scale alterations in chromatin'®. These dramatic changes are 
unlikely to happen during starvation, but are probably beneficial in 
restraining oncogenic and tumorigenic insults. Our study suggests that 
LC3-lamin B1 interaction occurs in the basal cellular state, and, upon 
aberrant cellular activities, initiates lamin B1 degradation (Fig. 5h) 
thus driving senescence to restrain cell proliferation. Hence, selective 
nuclear lamina degradation by autophagy may play a role in restricting 
tumorigenesis and maintaining cell and tissue integrity. 

Although our current work focuses on lamin B1, we anticipate that 
other nuclear substrates of autophagy have roles in tumour suppression 
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and other physiological/pathological scenarios. This study establishes 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. The 
experiments were not randomized. The investigators were not blinded to allocation 
during experiments and outcome assessment. 

Cell lines and culture. IMR90, mouse embryonic fibroblasts, and HEK293T were 
described previously’**". Primary BJ fibroblasts were purchased from ATCC. Cell 
line identities were not further authenticated. The cells were cultured in DMEM 
supplemented with 10% fetal bovine serum (FBS), 100U ml’ penicillin, and 
100g ml“! streptomycin (Invitrogen), and were intermittently tested for myco- 
plasma. IMR90 and BJ were cultured under physiological oxygen (3%), except for 
the H2O> treatment, in which cells were cultured in an incubator with 20% oxygen, 
and the experiments involved in live-cell imaging. For primary cell cultures, cells 
were briefly washed with PBS, trypsinized at 37°C for 2-4 min, and passaged at no 
more than 1:4 dilutions. Cells were counted with a Countess automated cell counter 
(Life Technologies), and the numbers were recorded where growth curves were 
generated. HEK293T cells were transfected using Lipofectamine 2000 (Invitrogen). 
For amino-acid starvation, cells were incubated in Hank’s buffer (with calcium 
and glucose) supplemented with 10% dialysed FBS and 1% HEPES (Invitrogen). 
For amino-acid and serum deprivation, cells were cultured in Hank’s buffer plus 
1% HEPES. 

Retrovirus and lentivirus infection. Stable cell lines were made by retrovirus or 
lentivirus infection, as previously described*!, with slight modifications. Retroviral 
constructs were transfected to Phoenix packaging cell line. Lentiviral pLKO con- 
structs were transfected with packaging plasmids to HEK293T cells. Viral superna- 
tant was filtered through a 0.45-um filter, supplemented with 8 ug ml! polybrene, 
and mixed with trypsinized recipient cells. p»LNCX-ER:HRasV12, WZL-hygro, 
and WZL-HRasV 12-hygro viral constructs were described elsewhere”. sh-Atg7 
hairpin sequence GGAGTCACAGCTCTTCCTTAC was from ref. 22, and cloned 
into Tet-pLKO-puro ‘all-in-one’ tetracycline-inducible vector*”. Doxycyclin 100 ng 
ml! was added to IMR90 to induce knockdown of Atg7. Another pLKO-shAtg7 
construct (TRCN0000007587) was purchased from Sigma-Aldrich and used in BJ 
fibroblasts. The infected cells were selected with puromycin, neomycin, or hygro- 
mycin for about 1 week. 

Reagents and antibodies. Rapamycin was purchased from Millipore. H,O2 was 
from Fisher Scientific. 4-Hydroxytamoxifen and etoposide was from Sigma- 
Aldrich. The following antibodies were used: LC3 (MBL PM036 for WB of mouse 
embryonic fibroblasts; Cell Signaling Technology 3868 for immunoprecipitation, 
ChIP, IF, WB; Cell Signaling Technology 2775 for WB), B-tubulin (Sigma-Aldrich 
T4026), calreticulin (Cell Signaling Technology 12238), COX IV (Cell Signaling 
Technology 4850), Atg5 (Cell Signaling Technology 8540), Atg7 (Cell Signaling 
Technology 8558), lamin B1 (Abcam ab16048), lamin B2 (Abcam ab8983), lamins 
A/C (Millipore MAB3211), GFP (Roche 11 814 460 001 and Abcam ab290), 
p62 (Abnova H00008878-M01), GAPDH (Fitzgerald Industries 10R-G109A), 
p16 (Abcam ab16123), Ras (Millipore 05-516), HA (Sigma-Aldrich H3663), 
H3K27me3 (Active Motif 39538), H3K9me3 (Abcam ab8898), LAMP1 (Iowa 
Hybridoma Bank H4a3-s), and Flag (Sigma-Aldrich F1804). 

Plasmids. GST, GST-LC34A, B, C, and GST-LC3B mutants/truncations were 
described elsewhere**. GFP, HA/Flag/GFP-LC3 WT and mutants, GFP-Beclin 
1, GFP-ULK1, GFP-lamin B1, and split Venus constructs were described previ- 
ously!”31-3435_ 5 Babe-mCherry—GFP-LC3 (ref. 36) was purchased from Addgene, 
and LC3 was truncated to make pBabe-mCherry-GFP, and then lamin B1 
sequences were cloned. Lamin B1 truncations/mutations were made from pEGFP- 
lamin B1 for direct transfection, pBabe-mCherry-GFP-lamin B1 for retrovirus, or 
pT7-NHA-lamin B1 for in vitro translation. Tet-inducible lentiviral GFP-lamin B1 
was made by cloning the GFP-lamin B1 fragment into pTRIPZ. All new constructs 
in this study were verified by DNA sequencing. 

Western blotting. Cells were lysed in buffer containing 50 mM Tris pH 7.5, 0.5mM 
EDTA, 150mM NaCl, 1% NP40, 1% SDS, supplemented with 1:100 Halt Protease 
inhibitor cocktail (Thermo Scientific). The lysates were briefly sonicated, and 
supernatants were subjected to electrophoresis using NuPAGE Bis-Tris precast 
gels (Life Technologies). After transferring to nitrocellulose membrane, 5% milk in 
TBS supplemented with 0.1% Tween 20 (TBST) was used to block the membrane 
at room temperature (~25°C) for 1h. Primary antibodies were diluted in 5% BSA 
in TBST, and incubated at 4°C overnight. The membrane was washed three times 
with TBST, each for 10 min, followed by incubation of HRP-conjugated secondary 
antibodies at room temperature for 1h, in 5% milk/TBST. The membrane was 
washed again three times, and imaged by a Fujifilm LAS-4000 imager. 
Immunoprecipitation. Cells were lysed in immunoprecipitation buffer containing 
20mM Tris, pH 7.5, 137 mM NaCl, 1mM MgCh, 1mM CaCh, 1% NP-40, 10% 
glycerol, supplemented with 1:100 Halt protease and phosphatase inhibitor cock- 
tail (Thermo Scientific) and benzonase (Novagen) at 12.5U ml~!. Benzonase is 
essential to release chromatin-bound proteins to supernatant, and MgCl) is critical 
for its activity. The lysates were rotated at 4°C for 30-60 min. The supernatant was 


incubated with antibody-conjugated Dynabeads (Life Technologies), and rotated 
at 4°C overnight. The immunoprecipitation was washed and collected by magnet, 
for five times with immunoprecipitation buffer, and boiled with NuPAGE loading 
dye. Samples were analysed by western blotting. 

In vitro translation. Cell-free in vitro translation was performed using the 1-Step 
In Vitro Translation Kit (Thermo Scientific), following the manufacturer’s guid- 
ance. Target proteins were cloned into pT7CFE1-NHA vector (with N-terminal 
HA tag) and translated in vitro at 30°C. 

Bacteria expression and GST pull-down. GST-tagged constructs were trans- 
formed into BL21-CodonPlus Escherichia coli and purified with glutathione beads 
(Life Technologies). Lamin B1 370-458 and 390-438 fragments were cloned into 
GST construct with a TEV protease recognition site between GST and the cloned 
sequences. The expressed proteins were loaded and purified with glutathione aga- 
rose beads, and digested with His-tagged TEV protease. The resulting supernatant 
was further purified with Ni-NTA beads (Qiagen) to remove His-tagged TEV 
protease. 

For GST pull-down, bacterial lysates were incubated with glutathione beads 

at 4°C for 2h and washed four times with buffer containing 50 mM Tris, pH 7.5, 
150mM NaCl, 1% Triton X-100, 1mM DTT, supplemented with 100 1M PMSF. 
The purified proteins or in vitro translated proteins were diluted in binding buffer 
(20 mM Tris, pH 7.5, 137mM NaCl, 1mM MgCh, 1mM CaCh, 1% NP-40, sup- 
plemented with 1:1,000 Halt Protease inhibitor cocktail) and then pre-cleared with 
GST at 4°C for 1h. The resulting supernatant was then subjected to GST pull-down 
with GST or GST fusion proteins. The product was washed four times with binding 
buffer and boiled with NuPAGE loading dye for immunoblotting analysis. Purified 
lamin B1 protein was purchased from Origene. 
Immunofluorescence and live-cell imaging. For immunofluorescence, cells were 
fixed in 4% paraformaldehyde in PBS for 30 min at room temperature. Cells were 
washed twice with PBS, and permeabilized with 0.5% Triton X-100 in PBS for 
10min. After washing two times, cells were blocked in 10% BSA in PBS for 1h at 
room temperature. Cells were incubated with primary antibodies in 5% BSA in 
PBS supplemented with 0.1% Tween 20 (PBST) overnight at 4°C. The next day, 
cells were washed four times with PBST, each for 10 min, followed by incubation 
with Alexa Fluor-conjugated secondary antibody (Life Technologies) in 5% BSA/ 
PBST for 1h at room temperature. Cells were then washed four times in PBST, 
incubated with 1 ug ml! DAPI in PBS for 5 min, and washed twice with PBS. The 
slides were then mounted with ProLong Gold (Life Technologies) and imaged 
with a Leica TCS SP8 fluorescent confocal microscope. The slides were mounted 
with ProLong Diamond (Life Technologies) for 5 days at room temperature for 
super-resolution microscopy. 

Three-dimensional structural illumination microscopy was performed using 
N-SIM Super-resolution Microscope System (Nikon) with an oil immersion 
objective lens CFI SR (Apochromat TIRF x 100, 1.49 numerical aperture; Nikon). 
Twenty to forty-one optical sections were collected with a 200 nm interval between 
neighbouring sections. 

For live-cell imaging, mCherry-GFP-lamin B1 HRasV12 cells were plated onto 
a35mM glass bottom dish (MatTek P35G-0-14-C) pre-coated with poly-L-lysine 
(Sigma-Aldrich). The dish was imaged with a spinning disk fluorescent confocal 
microscope (Olympus IX71 and IX81 Inverted System, coupled with an Andor 
iXon3 EMCCD camera, with motorized x-y stage, Okolab stagetop incubation 
chamber, and MetaMorph acquisition software). Cells were imaged overnight 
every 15 min. Twelve z-sections were acquired covering the entire individual cell. 
Images were viewed and presented as the maximum projection from all z-sections. 
TEM. For immuno-gold TEM, GFP-lamin B1 expressing IMR90 cells were sub- 
jected to high-pressure freezing. The samples were then dehydrated by freeze sub- 
stitution methods for 72h at —90°C in 0.1% uranyl acetate/acetone followed by 
embedding in Lowicryl HM20 at —50°C with 360 nm light polymerization of the 
resin for 48 h. Resin-embedded cells were sectioned at 70 nm thickness. GFP-lamin 
B1 was detected with a GFP antibody*® diluted 1:50 in 5% BSA, 0.1% fish gelatin, 
in PBS. Gold colloids (10 nm) conjugated to goat anti-rabbit (Electron Microscopy 
Sciences) at 1:200 was used for secondary detection of GFP-antigen conjugates 
followed by a 0.2% gluteraldehyde post-fix to stabilize the immuno-protein com- 
plexes. Imaging was performed at 80 keV on a JEOL 1010 at indicated magnifica- 
tions and collected digitally on an AMT side-entry CCD (charge-coupled device) 
without post-labelling heavy-metal staining. For TEM analysis of ultrastructures 
of control and HRasV12 IMR90, cells were subjected to high-pressure freezing, 
followed by standard TEM procedures. 

ChIP, RT-qPCR, and ChIP sequencing. These assays were performed as described 
previously’ with slight modification. In brief, cells were crosslinked with 1% 
formaldehyde diluted in PBS, without the addition of other co-crosslinkers, for 
5 min at room temperature. After glycin quenching, the cell pellets were lysed 
in buffer containing 50 mM Tris, pH 7.5, 150mM NaCl, 1% Triton X-100, 0.1% 
Na-deoxycholate, 0.1% SDS, supplemented with complete protease inhibitor 
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cocktail (Thermo Scientific), and sonicated with a Covaris sonicator, resulting in 
chromatin fragments with an average size of 250 base pairs. The supernatant was 
diluted ten times with the above buffer without SDS, and subjected to immuno- 
precipitations with 2 ug of antibody or control IgG conjugated with Dynabeads 
Protein A or G (Invitrogen) at 4°C overnight. The beads were then washed five 
times with buffer containing 50 mM Tris, pH 7.5, 150 mM NaCl, 1% Triton X-100, 
and once with final wash buffer (50mM Tris, pH 8.0, 10mM EDTA, 50 mM NaC)), 
followed by elution with incubation of elution buffer (final wash buffer plus 1% 
SDS) at 65°C for 30 min with agitation in a thermomixer. The ChIP and input 
were then purified and used for qPCR analysis or for constructing sequencing 
libraries with a NEBNext Ultra kit (New England Biolabs). For ChIP-sequencing, 
the libraries were quantified (Kapa Biosystems) and were single-end sequenced 
on an Illumina NextSeq 2000. 

The following primers were used for qPCR analyses of LADs. LAD1: forward, 
AGAGACGTGGCGTGTGTCC; reverse, GGCACTGAAGCCACCTCTGT 
(chromosome 4: 190524973-190525023). LAD2: forward, ATTTGC 
ACAATCTGAGGGCG; reverse, CTGGGCAATTCCCTTGGTAGT (chromosome 
7: 35434121-35434171). LAD3: forward, GCATCCATTTCACATCCTTGG; reverse, 
CCCATTGCCTCTGAAGTTTTGT (chromosome 8: 130184820-130184870). 
Subcellular fractionation. This was performed with the subcellular fractiona- 
tion kit for cultured cell (Thermo Scientific 78840) according to the manufacturer 
instructions, with slight modification. Benzonase (Novagen) was used to digest 
chromatin-bound proteins in the nuclear fraction, in the buffer supplemented 
with 5mM MgCh. 

Senescence-associated f-gal assay. }-Galactosidase assays were performed using a 
cellular senescence assay kit (Chemicon KAA002), according to the manufacturer's 
protocol. Cells were incubated with B-gal detection solution at 37°C overnight, 
and quantified under regular light microscopy. At least 200 cells were scored for 
B-gal positivity with over four different fields. 

Computational methods. Alignment of vertebrate lamin B1 proteins was done 
using ClustalX 2.1 (ref. 37). Computational analysis of ChIP-seq was performed 
as previously described and as follows. 

Data source: H3 (GEO accession number GSM897555), H3K4me3 (GEO 
GSM897556). H3K9me3 ChIP-seq data (GEO GSM942075 and GSM942119) 
were published elsewhere!>**, LC3 and lamin B1 ChIP-seq data in this study have 
been deposited in the GEO (http://www.ncbi.nlm.nih.gov/geo) under accession 
number GSE63440. 

Alignment of lamin B1, LC3, and input: all ChIP-seq data were aligned to the 
GRCh37 (hg19) assembly of the human genome using bowtie2 with command-line 
parameters -k1 -N1 —local (allowing and reporting a single alignment per read 
with one or zero mismatch permitted in the seed region). 

Track generation: ChIP-seq visualization tracks were created in the following 
way. Aligned sequence tags were subjected to BEDTools’ genomeCoverageBed 
tool, making bedGraphs that were multiplied by the RPM coefficient. A similarly 
normalized input bedGraph was then subtracted from lamins B1 and LC3, and 
bedGraphs were made into bigWigs using the University of California Santa Cruz 
Genome Browser’s bedGraphToBigWig utility. 

Box plot: aligned tag counts were assessed for each LAD for all marks under 
study, as well as the corresponding input and H3. The distribution of ChIP enrich- 
ment (ChIP-background) was computed over all LADs or over an equal number of 
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size-matched background regions, sampled from all genomic positions that did not 
overlap with LADs. Hypothesis testing was done by Mann-Whitney/Wilcoxon tests. 

Overlap permutation test: to determine whether LADs were significantly associ- 
ated with LC3ADs, the number of base pairs in common between LADs and other 
domains was tabulated using BEDTools intersect (default two-set comparison). In 
each of 1,000 iterations, LAD coordinates were randomly shuffled using BEDTools, 
creating 1,000 sets of equal-sized control regions. Each control set was scored for 
the number of base pairs in common with LC3ADs, and the frequency with which 
control sets shared more genomic space with other domains than LADs was taken 
to be an estimate of the probability that a LAD-LC3AD association was not due 
to chance. 

Area under the curve permutation test: for Fig. 2e, a permutation test for LC3, 
H3K9me3, and H3K4me3 over LADs was performed. In each of 100 iterations, 
LADs coordinates were randomly shuffled using BEDTools, creating 100 sets of 
equal-sized non-LADs control regions. LADs as well as each of the 100 non-LAD 
control sets were scored for LC3, H3K9me3, and H3K4me3 enrichment, and the 
number of control sets in which the median score was greater than or equal to the 
median value of the LAD distribution was tabulated. That frequency was taken 
to be an estimate of the probability that enrichment over LADs was not due to 
chance; that is, the probability of the null hypothesis that LADs and non-LADs had 
the same median enrichment. The P-value for H3K9me3 was less than 0.01, and 
the P-value for H3K4me3 was 1. This test was repeated using the 75th percentile 
value as the test statistic and with the 90th percentile value, with the same result 
in both cases. 

Domain detection: enriched domains for lamin B1 and LC3 were called using 
EDD" with default bin size estimation and gap penalty estimation, and unalign- 
able regions (the hg19 assembly gap track from Genome Reference Consortium) 
masked. The false discovery rate was controlled at the default value of 5%. 
Statistical analysis. Student's t-test was used for comparison between two groups. 
One-way ANOVA coupled with Tukey’s post hoc test was used for comparisons 
over two groups. Significance was considered when the P-value was less than 0.05. 


31. Dou, Z. et al. Class IA PI3K p110B subunit promotes autophagy through Rab5 
small GTPase in response to growth factor limitation. Mol. Cel! 50, 29-42 
(2013). 

32. Wiederschain, D. et a/. Single-vector inducible lentiviral RNAi system for 
oncology target validation. Cell Cycle 8, 498-504 (2009). 

33. Kirkin, V. et al. A role for NBR1 in autophagosomal degradation of 
ubiquitinated substrates. Mo/. Cell 33, 505-516 (2009). 

34. Pan, J.A, Ullman, E., Dou, Z. & Zong, W. X. Inhibition of protein degradation 
induces apoptosis through a microtubule-associated protein 1 light chain 
3-mediated activation of caspase-8 at intracellular membranes. Mol. Cell. Biol. 
31, 3158-3170 (2011). 

35. Zhong, Y. et al. Distinct regulation of autophagic activity by Atgl4L and 
Rubicon associated with Beclin 1-phosphatidylinositol-3-kinase complex. 
Nature Cell Biol. 11, 468-476 (2009). 

36. N’Diaye, E. N. et a/. PLIC proteins or ubiquilins regulate autophagy-dependent 
cell survival during nutrient starvation. EMBO Rep. 10, 173-179 (2009). 

37. Larkin, M. A. et al. Clustal W and Clustal X version 2.0. Bioinformatics 23, 
2947-2948 (2007). 

38. Chandra, T. et al. Independence of repressive histone marks and chromatin 
compaction during senescent heterochromatic layer formation. Mol. Cell 47, 
203-214 (2012). 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


a > b c x d IP 
OrON 
Se oF oY x > 
BSA standard g s xq oe ee ie oe es © 
X CEES oe SP 
Purified laminB1 + + + + Purified laminB1 + + IB: LC3 [= 
—_—— — IB: lamin B1 | —=——| a IB: lamin B1 —| IB: lamin B1 [—_ = 
e f 9g Nucleus 
GFP-LC3 WT GFP-LC3 G120A Input IP: IgG IP: lamin B1 —— ‘ 
G 
Untr Starved Untr Starved Atg5 status +/+ -/- +/+ -/- +/+ -/- xs xe 
IB: lamin B1 [se a | 
. I 
a = CS 
(LE) Atg7 [=] 
Lamin B1 ——| 
; x 
) 
h VN VN-LC3 VN VN-LC3 VN-LC3G120A—! 7-5 ey J 
vc vc VC-Lamin B1 VC-Lamin B14 VC-Lamin B41 ——__—_ 
VN Be 
VC + 
VN-LC3 WT + + 
Venus VN-LC3 G120A + 
VC-Lamin B1 + + + 
GFP 
(polyclonal) PALES 
- VN 
j * 
£3 
#2 ON 
a 
Merge 26 
< ti 
So 40 
a @ 20 
0 


Extended Data Figure 1 | Characterization of LC3 and lamin B1 
association. a, Protein gel staining of purified lamin B1 protein. 

b, c, Purified lamin B1 protein was subjected to GST pull-down. 

d, Endogenous LC3 immunoprecipitation in HEK293T cells. e, IMR90 stably 
expressing GFP-LC3 constructs were starved and imaged. f, Endogenous 
co-IP in wild-type and Atg5 knockout mouse embryonic fibroblasts. 

g, Nuclear fractions of control and Atg7 knockdown IMR90 cells were 
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analysed by LC3 immunoprecipitation. h-j, BiFC analysis of LC3-lamin 

B1 interaction. HeLa cells were transfected with the indicated combination 
of split Venus constructs and analysed as follows. h, Cells were fixed and 
imaged. i, Lysates were analysed by immunoblotting. j, Cells were scored 
for Venus positivity. Bars, mean + s.d.; n= 4, with over 500 cells; *P < 0.001; 
unpaired two-tailed Student's f-test. 
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Extended Data Figure 2 | LC3 interacts with LADs on chromatin. g, Per-nucleotide overlap of published data sets with the LADs called 
a, b, ChIP-qPCR of proliferating IMR90. c, ChIP-qPCR of LC3 knockdown from this study. Number unit: megabases. h, Enrichment over LC3ADs. 
IMR90. Bars, mean +s.e.m. (a, b), s.d. (c); n=3; *P< 0.05, **P< 0.005, *P<2.2 x 107!%; one-sided Wilcoxon test. i, Analysis of our lamin B1 and 
*** P< 0.0001; NS, non-significant; unpaired two-tailed Student's t-test. LC3 ChIP-seq at LADs defined by other studies, and randomly sampled 
d-i, ChIP-sequencing analyses. d, Related to Fig. 2c, a zoom-in window non-LAD loci (Ctrl). *P<2.2 x 107!%; one-sided Wilcoxon test. 


of chromosome 3. e, f, Analyses of two replicates at LADs and LC3ADs. 
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Extended Data Figure 3 | Lamin B1 degradation upon HRasV12-induced IMR90. Nu, nucleus. f, IMR90 cells stably expressing mCherry-GFP-lamin 
senescence. a, Related to Fig. 3b. Immunoblotting of immortalized IMR90. B1 were imaged and quantified. g, Cells as in f were treated with bafilomycin 
b, GFP-lamin B1 stably expressing IMR90 cells were treated as indicatedand —_A1 and imaged under confocal microscopy. 

imaged. Cytoplasmic signals are indicated by arrows. c-e, TEM analyses of 
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Extended Data Figure 4 | Imaging analyses of mCherry-GFP-lamin B1 are presented in Fig. 3c. b, Live-cell imaging of mCherry-GFP-lamin B1 
HRasV12 cells. a, Related to Fig. 3c. mCherry-GFP-lamin B1 HRasV12 HRasV 12 IMR90. Images shown are the maximum-projection combining all 
cells stably expressing IMR90 were imaged by three-dimensional super- z-sections. Nucleus-to-cytoplasm transport events are labelled sequentially 
resolution microscopy. Sections shown span the top, middle, and bottom as indicated. Note the initial yellow signal, followed by disappearance of GFP 
layers of the cell. The mCherry channel was deliberately under-exposed to then mCherry, in events 1 and 3; event 2 was not yet degraded by the end of 


prevent over-saturation of the cytoplasmic signals. Scale bar, 5m. The insets _ the imaging. 


© 2015 Macmillan Publishers Limited. All rights reserved 


Lamin B1 
(endogenous) 


H3K27me3 GFP-LC3 


Control 


HRasV12 


LC3 
(endogenous) 


[ee 
4 
2um 


2 
2um 


d Control 
™ 
ae 
~ “AS 
. = ‘ 
oh Se 
Nu k 
Rae Son ee, Sore 
J Sg OREN : 
cater ae 
Pt SS Ss Me 
on ie ae aye aay e's 
tie * eS ad \ , 
NOU a oe 
seule =< oF; aa : 
> é " 
“8500 nm 
: i 
‘“ . : 1 
Nu x 2 
Inset x a . y 
Moo WN ABE % : 
— Sys 
j i 
) 
“ 


Extended Data Figure 5 | CCF and lamin B1 are targeted by autophagy. 

a, b, IMR90 cells stably expressing GFP-LC3 and HRasV12 were stained with 
indicated antibodies and imaged under confocal microscopy. Cytoplasmic 
events are labelled by arrows. c, HRasV12 IMR90 cells were stained with 
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LC3 antibody. d, Related to Fig. 3e, immuno-TEM analysis of GFP-lamin 
B1 IMR90 cells. Cells were stained with a GFP antibody and conjugated with 
10 nm gold particles. Gold particles are indicated by arrows. 
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Extended Data Figure 6 | Knockdown of Atg7 attenuates lamin 

B1 downregulation. a, Related to Fig. 4a, quantification of lamin 

Bl immunoblots. Bars, mean £s.e.m.; n = 3; *P< 0.05, **P< 0.005, 

*** P< 0.0001, compared with sh-NTC day 0; NS, non-significant. 

b, Reverse transcribed qPCR of cells as in Fig. 4a. Data are the mean 
normalized to GAPDH +s.e.m.; n= 3. c, d, IMR90 cells were treated as 
indicated and analysed by immunoblotting. e, BJ cells were treated with 
etoposide and analysed by immunoblotting. f, g, Atg7 knockdown inhibits 
mCherry—GFP-lamin B1 nucleus-to-cytoplasm transport. Bars are 

mean +s.d.; n= 4, over 100 cells; * P< 0.0001. h, i, ER: HRasV12 BJ cells 
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stably expressing Dox-inducible GFP or GFP-lamin B1 were either left 
uninduced (bars 1 and 2), or induced with 4-OHT for 3 weeks (3-6). Cells 
were then induced with Dox (in the presence of 4-OHT) for an additional 

2 weeks (5 and 6). i, Quantification of B-gal positivity. Bars, mean + s.d.; 
n=4, over 200 cells. j, Related to Fig. 4a, quantification of p16 immunoblots. 
Bars, mean +s.e.m.; n= 3; * P< 0.05, compared with corresponding sh- 
NTC controls. k, ER:HRas IMR90 cells were scored for B-gal positivity. 

Bars, mean +s.d.; n= 4, over 200 cells; *P < 0.0005, ** P< 0.0001. One-way 
ANOVA coupled with Tukey’s post hoc test for a and i; all other tests were 
unpaired two-tailed Student's t-tests. 


LETTER 


a b S 
IP: Flag IP: HA Venus DAPI Merge 
GFP-Lamin B1 + + + GFP-Lamin B1 + + + 
Flag-LC3 WT + HA-LC3 WT + 
Flag-LC3 R10A/R11A + HA-LC3 R10A/R11A at VN-LC3 WT 
IB: GEP —_ IB: GFP an VC-Lamin B1 
IB: Flag sap IB: HA Sea 
CFP | — = GFP |? == om VN-LC3 R10A/R11A 
Input Input VC-Lamin B1 
Flag -—_ HA aap 
d we g DAPI Flag Merge 
cS 
& 0 
Le = 
PS 
Vv 8 
GFP —_—a_ VC-Lamin B1 a 
ji_d | iv} 
(polyclonal) | WEL vwcs i ad 
a = 104m 
[e) 
ro) <x 
1o) = 
e S 
i 
a g 
a : 
© X 
29 
aS oO 
R = 
rr 3 
8 + 
= g 
f O Vector = 


@ Flag-LC3 WT 
G Flag-LC3 R10A/R11A 


no} 
N 
@ 
£ 
re} 
c 
S$ 
Qa 
S < 
a = 
rm N Ss 
oO 2 v 
oO o 
oc (S) 
LAD1 LAD2 LAD3 + ra 
ra 
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combination of split Venus constructs. Bars, mean +s.d.; n= 4, over 500 cells; arrows. 
*P<0.0001. f, IMR90 cells stably expressing the indicated constructs were 
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Extended Data Figure 8 | Mapping of LC3-lamin B1 interaction. 

a, HEK293T cells transfected with indicated constructs were analysed by 
GST-LC3B pull-down. b, ¢, In vitro translated constructs were subjected 

to GST-LC3B pull-down. d, e, Evolutionary analyses of vertebrate lamin 

B1 and the corresponding regions of other lamin isoforms. e, Number of 
conserved residues normalized to total residues. f, Bacterially purified 
fragments were analysed by GST-LC3B pull-down. g, mCherry-GFP-lamin 


B1 370-458 localizes to the nucleus. h, Cells were starved and analysed by 
immunoblotting. i, j, Related to Fig. 4f, quantification of lamin B1 and p16 
immunoblots; n= 3. k, ER-HRasV12 IMR90 cells were scored for B-gal 
positivity; n =4, over 200 cells. Bars, mean +s.e.m. (i, j), s.d. (k); 

NS, non-significant; *P < 0.05; ** P< 0.0005; *** P< 0.0001; unpaired 
two-tailed Student’s t-test. 
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Extended Data Figure 9 | Additional characterization of lamin B1 ** P< 0,005, *** P< 0.0001; unpaired two-tailed Student's t-test. j, HEK293T 
substitution mutant. a-f, Related to Fig. 5a, in vitro translated proteins transfected cells were analysed by immunoprecipitation. k, ER;HRasV12 
were analysed by GST-LC3B pull-down. g, LC3 immunoprecipitation in IMR90 cells were induced with OHT and harvested for immunoblotting. 1, 


HEK293T cells transfected as indicated. The remaining interaction with the IMR90 cells were quantified for B-gal positivity. Bars, mean + s.d.; n=4, over 
mutant is probably due to the endogenous lamin B1 that interacts with LC3 200 cells; *P< 0.05, **P<0.01, ***P< 0.001, ****P< 0.0001, NS, non- 
and the mutant, as shown in j. h, i, IMR90 cells were imaged under confocal significant; one-way ANOVA coupled with Tukey’s post hoc test. 

microscopy and quantified. Bars, mean +s.d.; n=4, over 200 cells; *P< 0.05, 
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Extended Data Figure 10 | Lamin B1 370-458 fragment extends cellular 
lifespan. a, In vitro translated proteins were analysed by GST-LC3B pull- 
down. b, ER:HRasV12 IMR90 cells were quantified for B-gal positivity. 
Bars, mean +s.d.; 1 =4, over 200 cells; *P < 0.05; NS, non-significant; 


one-way ANOVA coupled with Tukey’s post hoc test. c, d, Related to Fig. 5f, 
representative images of }-gal. e, Related to Fig. 5g, cells were fixed and 
stained with DAPI. CCFs are indicated by arrows. 
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Conformational control of DNA target cleavage by 


CRISPR-Cas9 


Samuel H. Sternberg!, Benjamin LaFrance?, Matias Kaplan*+ & Jennifer A. Doudna!?*4° 


Cas9 is an RNA-guided DNA endonuclease that targets foreign 
DNA for destruction as part of a bacterial adaptive immune system 
mediated by clustered regularly interspaced short palindromic 
repeats (CRISPR)!. Together with single-guide RNAs’, Cas9 also 
functions as a powerful genome engineering tool in plants and 
animals*, and efforts are underway to increase the efficiency and 
specificity of DNA targeting for potential therapeutic applications”®. 
Studies of off-target effects have shown that DNA binding is far 
more promiscuous than DNA cleavage", yet the molecular cues 
that govern strand scission have not been elucidated. Here we 
show that the conformational state of the HNH nuclease domain 
directly controls DNA cleavage activity. Using intramolecular 
Forster resonance energy transfer experiments to detect relative 
orientations of the Cas9 catalytic domains when associated with 
on- and off-target DNA, we find that DNA cleavage efficiencies 
scale with the extent to which the HNH domain samples an activated 
conformation. We furthermore uncover a surprising mode of 
allosteric communication that ensures concerted firing of both Cas9 
nuclease domains. Our results highlight a proofreading mechanism 
beyond initial protospacer adjacent motif (PAM) recognition!” and 
RNA-DNA base-pairing? that serves as a final specificity checkpoint 
before DNA double-strand break formation. 

Cas9 is a large, multi-domain protein that undergoes RNA-induced 
conformational changes to reach a DNA-binding-competent state’. 
Crystal structures of apo', single-guide RNA (sgRNA)-bound", and 
sgRNA/DNA-bound!5:16 Cas9 from Streptococcus pyogenes (Fig. 1a, b) 
have revealed distinct conformational states of the protein but failed 
to explain its DNA cleavage mechanism, because in each structure the 
HNH domain active site is positioned at least 30 A away from the DNA 
cleavage site!>!®, Furthermore, available structures could not explain 
why DNA cleavage is precluded at stably bound off-target sites with 
incomplete RNA-DNA complementarity. We hypothesized that func- 
tionally important HNH conformational dynamics could influence 
the cleavage specificity of the Cas9-guide RNA enzyme complex. To 
test this possibility, we developed a Forster resonance energy trans- 
fer (FRET)-based approach to investigate Cas9 structural changes in 
response to binding sgRNA and DNA ligands. 

We generated a FRET construct to monitor Cas9 structural 
rearrangements upon sgRNA binding’ (Fig. 1b). Starting with a 
cysteine-free Cas9 variant, we introduced cysteine residues at posi- 
tions D435 and E945 near the hinge region and labelled these residues 
with Cy3- and Cy5-maleimide dyes, generating Cas9ninge. Control 
labelling reactions with cysteine-free Cas9 confirmed the conjugation 
specificity, and doubly labelled Cas9 was fully functional for DNA 
cleavage (Extended Data Fig. lac). Measurements from available 
structures revealed an expected distance change of ~60 A upon ssRNA 
and DNA binding (Extended Data Table 1), and indeed, when Cy3 of 
sgRNA-bound Cas9hinge was excited at 530 nm, we observed a sub- 
stantial decrease in energy transfer compared with apo-Cas9pinge, as 


evidenced by a relative increase in donor (Cy3) fluorescence relative 
to acceptor (Cy5) fluorescence (Fig. 1c). The observed change scaled 
with the molar ratio of sgRNA to Cas9, a mixture of donor-only and 
acceptor-only labelled Cas9pinge Showed no evidence of energy transfer, 
and an sgRNA specific to Neisseria meningitidis Cas9 (ref. 17), which 
significantly impairs S. pyogenes Cas9 binding (data not shown), elicited 
a negligible change (Extended Data Fig. 2a—c). We conclude that the 
change in fluorescence intensities resulted from an sgRNA-induced, 
intramolecular conformational change in Cas9pinge- 

Cas9hinge exhibited an ~70% decrease in energy transfer upon sgRNA 
binding as determined by (ratio), whereby the acceptor fluorescence 
intensity via energy transfer is normalized to that via direct excita- 
tion'®:? (Methods and Extended Data Fig. 2d). Target DNA binding 
induced little further change in FRET (Fig. 1c, d), consistent with avail- 
able structural data (Extended Data Table 1). To identify the molecular 
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Figure 1 | Full-length sgRNA drives inward lobe closure of Cas9. 

a, Domain organization of S. pyogenes Cas9 (top) and X-ray crystal structure 
of ssRNA/DNA-bound Cas9 (Protein Data Bank (PDB) accession number 
4UN3, ref. 16) (bottom), with HNH domain omitted for clarity. BH, bridge 
helix; REC, recognition; PI, PAM-interacting. b, Design of Cas9hinge FRET 
construct. Measured distances between D435 and E945 in apo (PDB 4CMP, 
ref. 13) and sgRNA/DNA-bound Cas9 structures are indicated. Inward lobe 
closure is exemplified by movement of the BH (arrow). Regions of the PI 
domain, sgRNA, and DNA are omitted for clarity. c, Fluorescence emission 
spectra for Cas9hinge in the presence of the indicated substrates. d, (Ratio) 4 
data for Cas9ninge. Inset: schematic of full-length sgRNA coloured by motifs. 
Error bars, s.d.3 n= 3. 
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Figure 2 | FRET experiments reveal an activated conformation of the 
HNH nuclease domain. a, Design of Cas9}nH-1 FRET construct. Measured 
distances between $355 and $867 in the ssRNA/DNA-bound Cas9 structure!® 
and a model of the HNH domain docked at the cleavage site are indicated, as 
are putative conformational changes of the HNH domain (arrow). The model 
was generated using an HNH homologue structure (PDB 2QNG, ref. 21). 


determinants that trigger conformational rearrangement of Cas9, we 
tested truncated variants of the sgRNA (Extended Data Table 2) and 
found that the 20-nucleotide target recognition sequence has a critical 
role in controlling the Cas9 conformational state (Fig. 1d). An sgRNA 
lacking the entire guide segment (Aguide1-20) generated a (ratio), 
value indistinguishable from apo-Cas9pinge while being more than 95% 
bound under our experimental conditions!4, whereas sgRNAs contain- 
ing part of the 20-nucleotide guide segment partly restored the change 
in (ratio)4. sgRNA variants lacking one or both hairpins at the 3’ end 
(Ahairpins1-2) also generated intermediate (ratio) 4 values (Fig. 1d) 
while retaining sub-nanomolar binding affinity to Cas9 (ref. 20), and 
similar data were obtained with catalytically dead (D10A/H840A) 
dCas9hinge (Extended Data Fig. 2e). We conclude that motifs at both 
ends of the sgRNA are required to stabilize a closed state of Cas9, but 
that in the case of Ahairpins1-2, a fully closed state is not required 
for rapid cleavage kinetics”°. We propose that intermediate (ratio) , 
changes reflect stable sgRNA-Cas9hpinge complexes interconverting 
between open and closed conformers. 

We next focused on the HNH nuclease domain. Since existing crys- 
tal structures exhibit inactive HNH domain conformations!>', we 
built a model for the putative activated state by docking a homologous 
HNH-dsDNA crystal structure”! onto the ssRNA/DNA-bound Cas9 
structure (Extended Data Fig. 3a—d). We selected two pairs of posi- 
tions (S355-S867 and S867-N1054) whose inter-residue distances, 
according to our model, would change substantially upon target DNA 
binding (Fig. 2a, Extended Data Fig. 3e and Extended Data Table 1). 
Cas9 labelled with Cy3 and Cy5 at these sites (Cas9ynH-1 and 
Cas9ynu-2) retained nearly wild-type DNA cleavage activity (Extended 
Data Fig. 1c). 

We observed a substantial FRET increase for catalytically inactive 
dCas9ynu-1 upon target DNA binding relative to sgRNA alone (Fig. 2b), 
and control experiments with non-target DNA or off-target DNA 
substrates containing either PAM or seed mutations failed to generate 
this change (Fig. 2b and Extended Data Table 2). We next monitored 
FRET with off-target DNA substrates containing mutations distal from 
the PAM, which retain high-affinity Cas9 binding’?**. Remarkably, 
the observed (ratio), values decreased as the number of mismatches 
increased (Fig. 2c), and these changes were not attributable to decreas- 
ing occupancy of the ssRNA/DNA-bound complex: direct binding 
assays indicate at least 89% of the dCas9yny-1 population should be 


b, Fluorescence emission spectra for dCas9ynx-1 in the presence of the 
indicated substrates. Inset: (ratio), values. c, (Ratio), data for dCas9ynH-1. 
Mismatches (mm) were introduced sequentially from the PAM-distal end of 
the target. d, Cleavage rate constants using wild-type Cas9. ND, cleavage not 
detected. e, (Ratio), data for catalytically active Cas9ynH-1 and Cas9yNuH-2. 
Error bars in b-e, s.d.; n =3. 


bound to all tested DNA substrates, and increasing the concentration of 
dsDNA had no discernible effect on (ratio) 4 (Extended Data Fig. 4a, b). 
Our results show that the HNH domain samples a conformational 
equilibrium with on-target DNA that is distinct from partly matching 
off-target DNA, and suggest that the high FRET state corresponds to 
an active HNH conformation at the cleavage site. 

We suspected that altered conformational states of the HNH 
domain could explain which off-target DNA substrates are cleaved 
by Cas9. Substrates with at least 4 base-pair (bp) mismatches that 
elicited a low (ratio), value were cleaved slowly, if at all (Fig. 2d and 
Extended Data Fig. 4c), as observed previously22.23, This indicates that 
the inability to access the high FRET state associated with an activated 
HNH conformation precludes cleavage. Interestingly, substrates with 
only 1-3 bp mismatches at the distal end of the target sequence were 
cleaved at near wild-type rates despite having diminished (ratio) 
values relative to the on-target. This suggests that rapidly intercon- 
verting conformational states, one of which is the activated state, may 
still enable rapid cleavage. Truncated sgRNAs with shorter regions of 
target complementarity that exhibit enhanced fidelity in genome edit- 
ing experiments” may similarly facilitate efficient on-target cleavage 
without stabilizing an activated HNH conformation. Single-molecule 
experiments will be necessary to reveal these putative dynamics, which 
are unavoidably averaged in our ensemble measurements. 

We observed a similar pattern of (ratio), changes using catalytically 
active Cas9y1, and the opposite trend of (ratio), changes was observed 
with Cas9ynu-2, a construct designed to undergo a high-to-low 
FRET efficiency transition upon on-target DNA binding (Fig. 2e and 
Extended Data Figs 3e and 4d). These data suggest that positioning of 
the HNH domain is largely unaffected by actual strand scission, but 
instead reflects a conformational equilibrium that is particularly sen- 
sitive to RNA-DNA heteroduplex formation at the distal end of the 
target. These observations emphasize the importance of RNA-DNA 
complementarity throughout the target region, rather than only the seed 
sequence closest to the PAM, in controlling Cas9 cleavage specificity. 

The HNH and RuvC nuclease domains cleave target and non-target 
strands 3 bp upstream of the PAM, respectively*”*. For partly unwound 
off-target substrates with mismatches >10bp further upstream, tar- 
get strand cleavage is precluded by conformational control of the 
HNH domain. However, the mechanism by which RuvC domain- 
catalysed non-target strand cleavage is avoided remains unknown. 
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conformational changes. a, Tested DNA substrates, with on-target (1) at top. 
Matched and mismatched positions of DNA target strand sequences relative 
to the sgRNA are coloured red and black, respectively, with the PAM in 
yellow. Some substrates contain internal mismatches between the two DNA 
strands; dashed lines indicate additional flanking sequences. Schematic at 
bottom right depicts identical non-target strand substrates presented to the 
RuvC nuclease domain in substrates 5 and 7. b, Non-target (black) and target 
(red) strand cleavage time courses for the indicated DNA substrates using 
wild-type Cas9. Exponential fits are shown as solid lines. c, (Ratio), data for 
Cas9ynu-1 (pink bars, left y axis) and non-target strand cleavage kinetics of 
the RuvC domain (blue bars, right y axis) for the indicated DNA substrates. 
d, Non-target and target strand cleavage time courses for the indicated DNA 
substrates using wild-type Cas9. Exponential fits are shown as solid lines. 

e, (Ratio) 4 data for Cas9yny-1. Error bars in b-e, s.d.; n = 3. 


We hypothesized that this activity would be sensitive to HNH domain 
conformational changes. We first separately measured HNH and RuvC 
domain cleavage rates for a panel of partly mismatched substrates and 
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found that both strands were consistently cleaved in synchrony (Fig. 3a, b 
and Extended Data Fig. 5a, b). We next used shorter DNA substrates 
with or without internal mismatches, such that Cas9-mediated DNA 
unwinding up to the site of an ssRNA-DNA mismatch would theoreti- 
cally present identical substrates to the RuvC domain active site (Fig. 3a). 
After separately measuring non-target strand cleavage kinetics and 
Cas9ynu-1 FRET, we observed a tight correlation between RuvC 
domain cleavage activity and the presence of an activated HNH con- 
formational state (Fig. 3c and Extended Data Fig. 5c-e). This finding 
provides strong evidence that HNH conformational dynamics exert 
allosteric control over the RuvC nuclease domain. Furthermore, the 
RuvC domain could still effectively cleave the non-target strand of a 
bubbled substrate that induced an activated HNH conformation, but 
whose target strand could not be cleaved by the HNH domain because 
of mismatches in the seed (Fig. 3d, e). Together, these data argue that 
HNH conformational changes, but not HNH nuclease function, trigger 
RuvC domain nuclease activity. 

We wondered how Cas9 achieves this functional coupling. The HNH 
domain is inserted between RuvC domain motifs II and III, but linkers 
connecting both domains are consistently disordered in available crys- 
tal structures and there are relatively few inter-domain contacts!3.15.16 
(Extended Data Fig. 6a). We purified an HNH deletion construct, 
AHNH-Cas9 (Extended Data Fig. 6a—c), that retained nearly wild- 
type DNA binding activity while being defective in non-target strand 
cleavage by the RuvC domain (Fig. 4a, b and Extended Data Fig. 6d). 
Thus, the HNH domain is required for RuvC nuclease domain activa- 
tion but is dispensable for RNA-guided DNA targeting. 

Finally, we sought to identify the basis of allostery between the HNH 
and RuvC domains. We hypothesized that two a-helices connecting 
the HNH and RuvC II motifs (residues S909-N940), previously 
shown to adopt an extended conformation and proposed to assist the 
HNH domain in approaching the cleavage site'®, were instead acting 
as a signal transducer (Extended Data Fig. 7a). We introduced a series 
of proline residues to specifically disrupt this a-helix and found that 
target strand cleavage kinetics by the HNH domain were minimally 
affected (Fig. 4c-e and Extended Data Fig. 7b, c). In stark contrast, 
RuvC domain nuclease activity was almost completely blocked with 
an E923P/T924P-Cas9 mutant, and this effect could be reversed with 
the corresponding alanine mutations (Fig. 4d, e and Extended Data 
Fig. 7c). The finding that this effect was not confined to highly con- 
served residues supports the idea that disruption of the helix-forming 
propensity of this region, and not specific point mutations, disabled 
the RuvC domain. We conclude that an intact extended a-helix acts as 
an allosteric switch to communicate the HNH conformational change 
to the RuvC domain and activate it for cleavage. Understanding the 
precise mechanism of activation will probably require additional struc- 
tures of Cas9 in a pre-cleavage state, with the intact non-target strand 
substrate bound in the RuvC active site. 


Figure 4 | Mechanism of communication between the HNH and 

RuvC nuclease domains to achieve concerted DNA cleavage. 

a, Target DNA binding assay with dCas9 and AHNH-Cas9, resolved 

by native polyacrylamide gel electrophoresis (PAGE) (top); for gel source 
data, see Supplementary Fig. 1. Quantified data are below; binding fits are 
shown as solid lines. b, Target DNA cleavage assay with dCas9, wild-type 
(WT) Cas9, and AHNH-Cas9, resolved by denaturing PAGE. 

S, substrate; NT, cleaved non-target strand; T, cleaved target strand. 

c, Magnified view of the sgRNA/DNA-bound Cas9 structure'® (top) 
highlights two a-helices connecting the HNH domain carboxy (C) terminus 
and RuvC III amino (N) terminus. Bottom shows sequence alignment”? of 
this region, and residues mutated to proline or alanine are indicated (arrows). 
d, Target DNA cleavage assay with the indicated Cas9 variants, resolved 

by denaturing PAGE. e, Target (red) and non-target (black) strand cleavage 
time courses with the indicated Cas9 variants (for WT-Cas9 data, 

see Fig. 3b). Exponential fits are shown as solid lines. Error bars in a and e, 
s.d.;n=5 and 3, respectively. f, Model for conformational control of target 
cleavage by CRISPR-Cas9. 
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Our data support a model in which the Cas9 endonuclease uses 
multiple levels of regulation to ensure accurate target DNA cleavage 
(Fig. 4f and Supplementary Video 1). After identification of potential 
targets via PAM binding and directional DNA unwinding dependent on 
sgRNA-DNA complementarity, recognition of on-target DNA drives 
a conformational change in the HNH nuclease domain that enables 
productive engagement with the scissile phosphate. Importantly, this 
same structural transition triggers RuvC domain catalytic activity, 
ensuring concerted cleavage of both DNA strands. Partly complemen- 
tary off-target DNA sequences may stably bind Cas9, but by failing to 
drive HNH conformational changes, avoid cleavage. The recent crys- 
tal structure of ssRNA/DNA-bound Cas9 from Staphylococcus aureus 
(~17% sequence identity with S. pyogenes Cas9) also exhibits an inactive 
HNH conformation”®, suggesting that conformational control of the 
HNH domain is a general feature of all Cas9 enzymes. Furthermore, 
this proofreading mechanism is strikingly similar to the R-loop locking 
mechanism used by the RNA-guided targeting complex (Cascade) from 
type I CRISPR-Cas systems, in which RNA-DNA heteroduplex forma- 
tion at the PAM-distal end of the target exerts allosteric control over 
Cascade conformational rearrangements near the PAM-proximal end 
that are required for subsequent target cleavage”. Beyond providing 
fundamental insights into the mechanism of DNA interrogation by 
Cas9, our findings have important implications for the use of Cas9 as a 
genome engineering technology. For example, our data can explain why 
little cleavage occurs at off-target DNA sequences identified in chromatin 
immunoprecipitation followed by sequencing (ChIP-seq) experi- 
ments®!!, and suggest that DNA nicking by the native Cas9 enzyme is 
disfavoured in cells owing to concerted cutting by the HNH and RuvC 
nuclease domains. Finally, our findings demonstrate an exciting oppor- 
tunity to use protein conformational changes that report on target DNA 
recognition for fluorescence-based readout of DNA binding in cells. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Cas9 and nucleic acid preparation. S. pyogenes Cas9 was cloned into a custom 
pET-based expression vector encoding an N-terminal His,9-tag followed by 
maltose-binding protein (MBP) and a TEV protease cleavage site. Point mutations 
were introduced using site-directed mutagenesis or around-the-horn PCR and 
verified by DNA sequencing. dCas9 refers to catalytically inactive (dead) Cas9 
containing D10A and H840A mutations. AHNH-Cas9 contained a deletion of 
residues T769-K918 and replacement with a GGSGGS linker. The HNH domain 
for add-back experiments (Extended Data Fig. 6d) encoded residues N776-G907. 
All Cas9 variants were purified as described". 

sgRNA templates were PCR amplified and cloned into EcoRI and BamHI 
sites in pUC19, and encoded full-length CRISPR RNA (crRNA) and trans- 
activating crRNA (tracrRNA) sequences connected via a GAAA tetraloop 
(Extended Data Table 2). sgRNAs were transcribed in vitro as described*° and 
purified using 5-10% denaturing PAGE. 

DNA substrates (Extended Data Table 2) were prepared from commercially syn- 

thesized oligonucleotides (Integrated DNA Technologies). DNA duplexes without 
internal mismatches were prepared and purified by native PAGE as described'’. 
DNA duplexes containing internal mismatches or overhangs were prepared by mix- 
ing a5x molar excess of one strand with its complementary strand in hybridization 
buffer (20 mM Tris-Cl pH 7.5, 100 mM KCl, 5mM MgCl), heating at 95°C for 
1-2 min, and slow-cooling on the benchtop. For FRET experiments, the non-target 
strand was in excess over the target strand; for biochemical cleavage experiments, 
the non-radiolabelled strand was in excess over the radiolabelled strand. 
Preparation of dye-labelled Cas9. Labelling reactions were conducted in Cas9 gel 
filtration buffer (20 mM Tris-Cl pH 7.5, 200 mM KCI, 5% glycerol, 1 mM TCEP) 
and contained 10 uM Cas9 and 200 uM Cy3- and Cy5-maleimide (GE Healthcare). 
Dyes were initially dissolved in anhydrous DMSO before being mixed with Cas9, 
and the final DMSO concentration did not exceed 5%. Reactions were incubated 
in the dark for 2h at room temperature (~22 °C) followed by incubation over- 
night at 4 °C. Reactions were quenched by adding 10mM DTT, and labelled Cas9 
was separated from free dye by size-exclusion chromatography on a Superdex 200 
10/300 column. Samples were then concentrated, snap frozen in liquid nitrogen, 
and stored at —80°C. Control labelling reactions contained either cysteine-free 
Cas9 or only one of the two dyes. 
FRET experiments. All fluorescence measurements were conducted at room tem- 
perature in reaction buffer (20 mM Tris-Cl pH 7.5, 100 mM KCl, 5mM MgCh, 5% 
glycerol, 1mM DTT), supplemented with 50 ug ml! heparin to reduce non- 
specific DNA binding”. Reactions (60 ul) with Cas9pinge (C80S/D435C/C574S/ 
E945C-Cas9 labelled with Cy3/Cy5) and dCas9ninge (Cas9ninge with additional 
nuclease-inactivating D10A/H840A mutations) contained either 50nM or 100nM 
Cas9, and, when present, a 10x and 4x molar excess of sgRNA and target DNA, 
respectively. Reactions (60 ul) with Cas9yny-1 (C80S/S355C/C574S/S867C- 
Cas9 labelled with Cy3/Cy5), dCas9ynu-1 (Cas9ynH-1 With additional nuclease- 
inactivating D10A/H840A mutations), and Cas9yNH-2 (C80S/C574S/S867C/ 
N1054C-Cas9 labelled with Cy3/Cy5) contained 50nM Cas9, and, when present, 
200nM sgRNA and DNA unless otherwise indicated. 

We observed substantial aggregation of apo-Cas9 upon 10 min incubation at 
37°C, as indicated by apparent intermolecular FRET with a single-cysteine Cas9 
(C80S/C574S/S867C) that had been labelled with a mixture of Cy3- and Cy5- 
maleimide (data not shown). This aggregation could be completely avoided by 
incubating reactions for 10 min at room temperature instead, centrifuging reactions 
for 5 min at 16,000g and 4°C, and using the supernatant for subsequent fluores- 
cence measurements. This binding protocol was used for all reported FRET data. 
Reactions were kept at room temperature for about 10-100 min before acquisition 
of fluorescence spectra, and this variable time delay had no effect on the resulting 
data, even for reactions with catalytically active Cas9 (data not shown). 

Fluorescence measurements were collected with a 3mm path-length quartz 
cuvette (Hellma Analytics) and a FluoroMax-3 (HORIBA Jobin Yvon), using 5nm 
slit widths and 0.2: integration time. For each sample, two fluorescence emission 
spectra were recorded: (1) the sample was excited at 530 nm and emitted light was 
collected from 550-800 nm in 1 nm increments; and (2) the sample was excited at 


630nm and emitted light was collected from 650-800 nm in 1 nm increments. Data 
processing was conducted using FluorEssence software (HORIBA Jobin Yvon). 
Experiments were replicated at least three times, and the presented data are repre- 
sentative results unless stated otherwise. 

FRET analysis. The distance between donor and acceptor dyes can be directly 
calculated from the FRET efficiency, but accurately relating these variables requires 
knowledge of numerous complex parameters'®. Our labelling strategy resulted 
in a heterogeneous mixture of unlabelled, singly-, and doubly-labelled species, 
further complicating the analysis. We therefore report (ratio), as defined by 
refs 18, 19, whereby acceptor (Cy5) fluorescence via energy transfer is normalized 
against acceptor fluorescence via direct excitation (Extended Data Fig. 2d), without 
pursuing a more rigorous calculation of exact distances. (Ratio), is directly pro- 
portional to FRET efficiency, and changes in (ratio), across different experimental 
conditions serve as a proxy for conformational changes. 

For each FRET construct, a donor-only (Cy3-labelled) sample was prepared 
and its emission spectrum in the apo state after 530 nm excitation collected. This 
spectrum was normalized to and subtracted from each experimental emission 
spectrum to generate an extracted fluorescence spectrum for the acceptor via 
energy transfer. The integrated area under this curve from 650-800 nm was cal- 
culated and divided by the integrated area under the curve of a spectrum resulting 
from direct acceptor excitation at 630nm, resulting in (ratio),, the enhancement of 
acceptor fluorescence due to FRET. Raw fluorescence emission spectra presented 
in the figures and Extended Data figures were normalized and smoothed using the 
Savistsky—Golay method, and all data analysis used Prism (GraphPad Software). 
DNA binding and cleavage assays. Biochemical assays were conducted essentially 
as described!’. Binding reactions used <0.1nM 5’-[3*P] DNA duplex substrates 
radiolabelled on both strands and a constant excess of 100nM sgRNA in the pres- 
ence of increasing concentrations of dCas9 or AHNH-Cas9. Cas9 and sgRNA were 
pre-incubated at 37°C for 10 min in reaction buffer supplemented with 501.gml-! 
heparin before being incubated with DNA for ~30 min at room temperature. 
Reactions were resolved by 5% native PAGE (0.5x TBE, 5mM MgCl) at 4°C and 
visualized by phosphorimaging (GE Healthcare). 

DNA cleavage experiments presented in Figs 2d and 4b, d and Extended 
Data Figs 1c and 6d used 5’-[?"P]DNA duplex substrates radiolabelled on both 
strands; all other cleavage experiments used DNA duplex substrates with a sin- 
gle 5’-[°P]-radiolabelled strand that had been annealed to a5 molar excess of 
unlabelled complementary strand. Cas9 and sgRNA were pre-incubated at 37°C for 
10 min in reaction buffer before adding DNA. Cleavage reactions were performed 
at room temperature and contained 1nM DNA and 100nM Cas9-sgRNA complex. 
Aliquots were removed at various time points and quenched by mixing with an 
equal volume of formamide gel loading buffer supplemented with 50 mM EDTA. 
Cleavage products were resolved by 10% denaturing PAGE and visualized by phos- 
phorimaging (GE Healthcare). Reported pseudo-first-order rate constants (Kops) 
represent the population-weighted average from double-exponential fits (target 
strand cleavage data for proline mutants, Fig. 4e and Extended Data Fig. 7b, c) or 
the result from single-exponential fits (all other data). In some cases, where the 
observed fraction of cleaved DNA was <0.1 after 2h, the exponential fit plateau 
was fixed at 0.75 to avoid overestimating the rate constant. 

Experiments were replicated at least three times, and presented data are repre- 
sentative results unless stated otherwise. 

Cas9 structural analysis. All structure figures were generated using Pymol 
(Schrédinger). Cas9 molecules from distinct crystal structures were aligned using 
the RuvC and PI domains (root mean squared deviation + 0.5-0.7). To generate the 
modelled docked state for the HNH domain (Fig. 2a and Extended Data Fig. 3), 
nucleotides 12-13 of chain D of PDB 2QNC (endonuclease-VII-DNA structure) 
were first aligned to nucleotides 11-12 of chain C of PDB 4UN3 (sgRNA/DNA- 
bound Cas9 structure). A copy of the Cas? HNH domain from PDB 4UN3 was 
then aligned to chain A of PDB 2QNC. Conservation rendering was done using a 
multiple sequence alignment of 250 Cas9 homologues and the ConSurf server*!. 


30. Sternberg, S. H., Haurwitz, R. E. & Doudna, J. A. Mechanism of substrate 
selection by a highly specific CRISPR endoribonuclease. RNA 18, 661-672 
(2012). 

31. Ashkenazy, H., Erez, E., Martz, E., Pupko, T. & Ben-Tal, N. ConSurf 2010: 
calculating evolutionary conservation in sequence and structure of proteins 
and nucleic acids. Nucleic Acids Res. 38, W529-W533 (2010). 
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Extended Data Figure 1 | Biochemical preparation and DNA cleavage 
activity of dye-labelled Cas9. a, Size-exclusion chromatograms of Cy3/ 
Cy5-labelling reactions with cysteine-free Cas9 (C80S/C574S) or the two 
double-cysteine Cas9 variants used to generate Cas9pinge and Cas9ynu-1- 
Reactions contained 10 uM Cas9 and 200 uM Cy3- and Cy5-maleimide, and 
were separated on a Superdex 200 10/300 column (GE Healthcare). Cysteine- 
free Cas9 was unreactive. b, Sodium dodecyl sulphate-polyacrylamide 

gel electrophoresis (SDS-PAGE) analysis of unlabelled and dye-labelled 


Cas9 variants. The gel was scanned for Cy3 and Cy5 fluorescence (right) 
before being stained with Coomassie blue (left). For gel source data, see 
Supplementary Fig. 1. c, Representative radiolabelled DNA cleavage assay 
with wild-type (WT) Cas9 and doubly labelled Cas9 variants used in this 
study, resolved by denaturing PAGE (left); quantified data and exponential 
fits are shown on the right. S, substrate; NT, cleaved non-target strand; 

T, cleaved target strand. Error bars, s.d.; n= 3. 
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Extended Data Figure 2 | Fluorescence control experiments with Cas9hinge 
and dCas9pinge, and representative analysis of fluorescence emission 
spectra to calculate (ratio) 4. a, Fluorescence emission spectra of 50nM 
Cas9hinge in the presence of increasing concentrations of full-length sgRNA. 
Protein and sgRNA concentrations were calculated under non-denaturing 
conditions using theoretical extinction coefficients. b, Fluorescence emission 
spectra of (1) Cy3-labelled Cas9pinge, (2) Cy5-labelled Cas9pinge, and (3) 

an equal mixture of Cy3-Cas9hinge and Cy5-Cas9hinge upon excitation 

at 530nm. The minor fluorescence peak for Cy5 in the mixed sample 

results from residual absorbance of Cy5-Cas9pinge at 530 nm and not from 
intermolecular FRET (compare spectra 3 with 4, which is a sum of spectra 

1 and 2). c, Fluorescence emission spectra of Cas9pinge in the presence of 
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sgRNA substrates specific to S. pyogenes (Spy) or N. meningitidis (Nme) Cas9. 
d, Determination of the (ratio), parameter, which is proportional to FRET 
efficiency. Shown for apo-Cas9pinge are (1) an emission spectrum of Cy3/Cy5- 
Cas9hinge upon excitation of the donor at 530 nm; (2) an emission spectrum 
of donor only Cy3-Cas9pinge upon excitation of the donor at 530nm, 
normalized to 1; (3) the extracted fluorescence of the acceptor via energy 
transfer, obtained by subtracting 2 from 1; and (4) an emission spectrum of 
Cy3/Cy5-Cas9pinge upon direct excitation of the acceptor at 630 nm. (Ratio), 
is calculated by dividing the integrated intensity (650-800 nm) of 3 by the 
integrated intensity of 4. e, (Ratio), data for dCas9hinge in the presence of the 
same sgRNA substrates tested with nuclease-active Cas9hinge in Fig. le. Error 
bars, s.d.;3n=3. 
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Extended Data Figure 3 | Modelling of the HNH domain docked at the 
cleavage site, and design of the Cas9yny-2 FRET construct. a, The scissile 
phosphate and flanking nucleotides of a DNA substrate co-crystallized with 
the phage T4 endonuclease VII (endo VII; PDB 2QNG; left) were aligned 
with the scissile phosphate and flanking nucleotides of the DNA target 
strand in the ssRNA/DNA-bound Cas9 crystal structure (PDB 4UN3; 
middle). Structural alignment of the Cas9 HNH domain with endonuclease 
VII (middle) results in a model of how the Cas9 HNH domain docks at 

the cleavage site (right). Catalytic residues are labelled, target strands are 
shown in red and pink, and a magnesium ion is depicted as a blue sphere. b, 
Conservation rendering of the ssRNA/DNA-bound Cas9 crystal structure, 
generated using ConSurf, shows that the most highly conserved patches 

of the HNH domain, including the active site, are solvent-exposed in the 
observed conformation. The HNH domain is omitted from the view on 


Active site 


Scissile 
phosphate 


~N1054 


sgRNA/DNA-bound 


Modeled docked state 


the left for clarity. c, Magnified view of the HNH domain in its observed 
conformation (left) and the model for the docked state (right), coloured as 

in b. The DNA target strand fits snugly in a groove on the HNH domain in 
the model, with the most highly conserved patches located in the immediate 
vicinity of the scissile phosphate. DNA and sgRNA are coloured red and 
orange, respectively. d, The conformational flexibility of the HNH domain 

in available Cas9 crystal structures is revealed by structural alignment of 

the nuclease lobe (RuvC and PI domains) from two sgRNA/DNA-bound 
structures (PDB accession numbers 4UN3 and 4008) and the sgRNA-bound 
structure (PDB 4ZT0). The modelled docked state from a is shown. e, Design 
of Cas9ynH-2 FRET construct. Measured distances between ~N1054 and $867 
in the ssRNA/DNA-bound Cas9 structure and a model of the HNH domain 
docked at the cleavage site are indicated. Putative conformational changes of 
the HNH domain are shown with a black arrow. 
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Extended Data Figure 4 | Evidence that variable (ratio) 4 values for presence of 1 1M sgRNA and either 200 nM, 400 nM, or 1 uM off-target 
dCas9ynu-1 reflect distinct conformational states/dynamics, and FRET DNAs containing 2- or 4-bp mismatches. Data for sgRNA only and 
data for Cas9ynu-2. a, DNA binding assay with dCas9 and either on-target DNA are shown for comparison. c, DNA cleavage time courses 
on-target DNA or off-target DNAs containing 2, 4, or 8-bp mismatches at the _ for the indicated DNA substrates using wild-type Cas9. Exponential fits are 
PAM-distal end. Binding fits are shown as solid lines and yield equilibrium shown as solid lines, and extracted rate constants are shown in Fig. 2d. 
dissociation constants (K,) of 0.80, 6.7, 19, and 20 nM, respectively. Given d, Fluorescence emission spectra of Cas9ynx-2 in the presence of the 
these values, 99%, 96%, 89%, and 89% of dCas9 should be bound to indicated substrates. The inset shows (ratio) 4 values; mut, mutation. 
DNA under the conditions used for FRET experiments in Fig. 2c (50nM Error bars in a and b-d, s.d.; n = 3-5 and 3, respectively. 


dCas9ynu-1, 200nM DNA). b, (Ratio), data for 50nM dCas9ynq_) in the 
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Identical non-target strands 


Extended Data Figure 5 | Additional experimental support for 
dependence of RuvC nuclease activity on HNH conformational changes. 
a, Panel of DNA substrates tested in b, with on-target (1) at top. Matched 
and mismatched positions of DNA target strand sequences relative to the 
sgRNA are coloured red and black, respectively, with the PAM in yellow. 
Some substrates contain internal mismatches between the two DNA strands; 


Identical non-target strands 


dashed lines indicate additional flanking sequence. b, Kinetics of non-target 
(black) and target (red) strand cleavage for the indicated DNA substrates. 

c, Panel of DNA substrates tested in d and e, depicted as in a. d, (Ratio), 
data for Cas9yniy-1 in the presence of the indicated DNA substrates. 

e, Non-target strand cleavage kinetics of the RuvC domain for the indicated 
DNA substrates. Error bars in b, d, e, s.d.; n= 3. 
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Extended Data Figure 6 | Design, purification, and DNA cleavage activity 
of AHNH-Cas9. a, Domain organization of WT- and AHNH-Cas9 (top), 
showing the residues that were replaced with a GGS, linker to generate 
AHNH-Cas9. Magnified view of connections between the HNH domain and 
RuvC II and III motifs in the apo (left) and sgRNA/DNA-bound (right) Cas9 
crystal structures, as well as in the AHNH-Cas9 construct. Disordered linkers 
and the introduced GGS, linker are shown as dashed lines. b, Size-exclusion 
chromatograms of WT- and AHNH-Cas9 using a Superdex 200 16/60 
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column (GE Healthcare). c, SDS-PAGE analysis of dCas9 (D10A/H8404A), 
WT-Cas9, AHNH-Cas9, and the purified HNH domain (residues 776-907). 
Expected molecular masses are 159 kDa, 159 kDa, 142 kDa, and 16 kDa, 
respectively. For gel source data, see Supplementary Fig. 1. d, Representative 
radiolabelled DNA cleavage assay with WT-Cas9, AHNH-Cas9, AHNH- 
Cas9 in the presence of excess HNH domain, and HNH domain alone, 
resolved by denaturing PAGE. 
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Extended Data Figure 7 | Structural analysis and perturbation of the 
HNH-RuvC III linker. a, Molecules A (left) and B (right) of the sgRNA/ 
DNA-bound Cas9 crystal structure (PDB 4008). Molecule A has an ordered 
HNH domain and HNH-RuvC II] linker, whereas these are both disordered 
in molecule B; the missing density for the HNH domain is replaced with 

the modelled docked state (right). Another prominent difference is the 
N-terminal region of the RuvC III motif (blue helices), which rearranges 


from a helix-loop-helix in molecule A into an extended helix in molecule 
B. Proline pairs were inserted to prevent formation of this extended helix. 
b, Target (red) and non-target (black) strand cleavage time courses with the 
indicated Cas9 variant. Exponential fits are shown as solid lines. c, Kinetics 
of target (red) and non-target (black) strand cleavage for the indicated Cas9 
mutants. ND, cleavage not detected. Error bars in b and ¢, s.d.3 n=3. 
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Extended Data Table 1 | Measured distances between residues labelled with FRET pairs 


Inter-residue distance * 


Structure used D435-E945 S355-S867 S867-N1054 
Apo 

(4CMP) 21A 79A 6A 
sgRNA-bound 78A B1A 7A 

(4ZTO) 
sgRNA/DNA-bound 7 

(4008 mol A) 77 At 61A 34At 
sgRNA/DNA-bound 

(4UN3) 83A 59A 28A§ 
sgRNA/DNA-bound, 21A 57 AS 


HNH docked state || 


“Distances were measured between Ca atoms of the indicated residues, except where indicated, for the denoted structures (PDB accession numbers in parentheses). 
+E945 is disordered in the structure; an average of measured distances to T941 and 1950 is reported. 

#N1054 is disordered in the structure; an average of measured distances to T1048 and 11063 is reported. 

§N1054 is disordered in the structure; an average of measured distances to 11050 and K1059 is reported. 

'The docked state for the HNH domain was generated using PDB accession numbers 4UN3 and 2QNC. 
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Extended Data Table 2 | RNA and DNA substrates used in this study 


Description Sequence * 


MM-targeting sgRNA + 5 '-GACGCAUAAAGAUGAGACGCGUUUUAGAGCUAUGCUGUUUUGGAAACAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUGGAUC-3 ' 


\M-targeting sgRNA, 5 ' -GACUGACGCAUAAAGAUGAGACGCGUUGUAGCUCCCUUUCUCAUUUCGGAAACGAAAUGAGAACCGUUGCUACAAUAAGGCCGUCUGAAAAGAUGUGCCGCAACGCUCUGCCCCUUAAAGCUUCUGCUUUA 
Nme Cas9 } AGGGGCAUCGUUUAUUGCUCGUGCGCUGGAUC-3' 

M-targeting sgRNA, 5 ' -GUAAAGAUGAGACGCGUUUUAGAGCUAUGCUGUUUUGGAAACAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUGGAUC-3' 
Aguide1—5 

\-targeting sgRNA, 5 ' -GAUGAGACGCGUUUUAGAGCUAUGCUGUUUUGGAAACAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUGGAUC-3' 

Aguide1-10 

\M]-targeting sgRNA, 5 ' -GACGCGUUUUAGAGCUAUGCUGUUUUGGAAACAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUGGAUC-3 ' 

Aguide1—15 

MM-targeting sgRNA, 5 ' -GGUUUUAGAGCUAUGCUGUUUUGGAAACAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGCACCGAGUCGGUGCUUUUUUUGGAUC-3 ' 


Aguide1—20 § 


|-targeting sgRNA, 5 ' -GACGCAUAAAGAUGAGACGCGUUUUAGAGCUAUGCUGUUUUGGAAACAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUUAUCAACUUGAAAAAGUGGAUC-3 ' 
Ahairpin1 

M|-targeting sgRNA, 5 '-GACGCAUAAAGAUGAGACGCGUUUUAGAGCUAUGCUGUUUUGGAAACAAAACAGCAUAGCAAGUUAAAAUAAGGCUAGUCCGUGGAUC-3' 

Ahairpins1-—2 

MM on-target DNA, 5! -AGCAGAAATCTCTGCTGACGCATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 : 

Substrate 1, Fig. 3 3 | -TCGTCTTTAGAGACGACTGCGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' 

2 non-target DNA 5! -GAGTGGAAGGATGCCAGTGATAAGTGGAATGCCATGIGGGCTGTCAAAATTGAGC-3 ' 


3 '-CTCACCTTCCTACGGTCACTATTCACCTTACGGTACACCCGACAGTTTTAACTCG-5' 


M off-target DNA, 5 ' -AGCAGAAATCTCTGCTGACGCATAAAGATGAGACGCTCGAGTACAAACGTCAGCT-3 ' Description Sequence * 
PAM mutation 3 | -TCGTCTTTAGAGACGACTGCGTATTTCTACTCTGCGAGCTCATGTTTGCAGTCGA-5 ' 
M off-target DNA, 5 ' -AGCAGAAATCTCTGCTGACGCATAAAGATGAGTGCGTGGAGTACAAACGTCAGCT-3 ' M off-target DNA, 5 ' -AGCAGAAATCTCTGCTCTGCCATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' 
seed mutation 3'-TCGTCTTTAGAGACGACTGCGTATTTCTACTCACGCACCTCATGTTTGCAGTCGA-5 ' Substrate 2, ED Fig. 5 3'-TCGTCTTTAGAGACGAGACGGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' 
F GACG, ; 
M! off-target DNA, 5! -AGCAGAAATCTCTGCTCACGCATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' MM off-target DNA, 5‘ -AGcaGAAATCTCTGCT@*”GcaTAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 
1-bp mismatch 3 ' -TCGTCTTTAGAGACGAGTGCGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' Substrate 3, ED Fig. 5 3 | -TCGTCTTTAGAGACGAGACGGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' 
MM off-target DNA, 5 ' -AGCAGAAATCTCTGCTCTCGCATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' M off-target DNA, 5 ' -AGCAGAAATCTCTGCTC"CCcaTAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' 
2-bp mismatch 3 ' -TCGTCTTTAGAGACGAGAGCGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5' Substrate 4, ED Fig. 5 3 '-TCGTCTTTAGAGACGACTGCGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5' 
MM off-target DNA, 5! -AGCAGAAATCTCTGCTCTGGCATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' M off-target DNA, 5'- GACGCATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3' 
3-bp mismatch 3! -TCGTCTTTAGAGACGAGACCGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' Substrate 5, ED Fig. 5 3 ' -TCGTCTTTAGAGACGACTGCGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' 
MM offttarget DNA, 5 ' -AGCAGAAATCTCTGCTCTGCCATAAAGATGAGACGCHEGAGTACAAACGTCAGCT-3' M off-target DNA, 5 ' -GACGCATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3' 
4-bp mismatch 3 ' -TCGTCTTTAGAGACGAGACGGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' Substrate 6, ED Fig. 5 3! -CTGCGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' 
M off-target DNA, 5! -AGCAGAAATCTCTGCTCTGCGTTAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' M off-target DNA, St. GACGcaTAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3' 
6-bp mismatch 3 ' -TCGTCTTTAGAGACGAGACGCAATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' Substrate 7, ED Fig. 5 3 ' -TCGTCTTTAGAGACGAGACGGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' 
MM off-target DNA, 5' -AGCAGAAATCTCTGCTCTGCGTATAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' MM off-target DNA, 5* -SACG caTAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' 
8-bp mismatch 3 ' -TCGTCTTTAGAGACGAGACGCATATTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' Substrate 8, ED Fig. 5 3t- GTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5' 
; i , GACG, , 
MM off-target DNA, 5 ' -AGCAGAAATCTCTGCTCTGCCATAAAGATGAGACG GTACAAACGTCAGCT-3 M off-target DNA, 5 1-CACG caTAAAGATGAGACG GTACAAACGTCAGCT-3 
Substrate 2, Fig. 3 3 | -TCGTCTTTAGAGACGAGACGGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' Substrate 9, ED Fig. 5 3'-GACGGTATTICTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' 
M off-target DNA, 5'-accacaaarcrcrect®*°ScaTAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' M off-target DNA, 5 ' -CTGCCATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3' 
Substrate 3, Fig. 3 3' -TCGTCTTTAGAGACGAGACGGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' Substrate 10, ED Fig. 5 3° -GACGGTATTICTACTCTGCGACCTCATGTITGCAGTCGA-5 ' 
MM off-target DNA, 5" -°TSCcATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' M off-target DNA, 5! -°TGCcaTAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' 
Substrate 4, Fig. 3 3! -CTGCGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5' Substrate 11, ED Fig. 5 3'-CTGCGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5' 
M off-target DNA, 5'-  CATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3' M1 off-target DNA, 5'- CATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' 
Substrate 5, Fig. 3 3' -CTGCGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5’ Substrate 12, ED Fig.5 | 3'-TCGTCTTTAGAGACGACTGCGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' 
M off-target DNA, 5 -GACGcaTAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3 ' MM off-target DNA, 5'- _ CATARAGATGAGACGCTGGAGTACAAACGTCAGCT-3' 
Substrate 6, Fig. 3 3'-GACGGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5' Substrate 13, ED Fig. 5 3 -CEGCOTATERCTACTCTOCGACCTCATGTINOCAGTOGA-5* 
MM offttarget DNA, 5'-  CATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3' ! off-target DNA, 5'- CATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3' 
Substrate 7, Fig. 3 3 '-GACGGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' Substrate 14, ED Fig. 5 3 ' -TCGTCTTTAGAGACGAGACGGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5 ' 
i Ge ‘ 
MM off-target DNA, 5 ' -AGCAGAAATCTCTGCTGACGCATAAAGATGAGACSPGGAGTACAAACGTCAGCT-3 MM off-target DNA, 5'-  CATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3' 
Substrate 8, Fig. 3 3' -TCGTCTTTAGAGACGACTGCGTATTTCTACTCTGGCACCTCATGTTTGCAGTCGA-5' Substrate 15, ED Fig. 5 3'-  GTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5' 
' ACG : 
M off-target DNA, 5 ' -AGCAGAAATCTCTGCTGACGCATAAAGATGAG GTACAAACGTCAGCT-3 M off-target DNA, Ste CATAAAGATGAGACGCTGGAGTACAAACGTCAGCT-3' 
Substrate 9, Fig. 3 3 ' -TCGTCTTTAGAGACGACTGCGTATTTCTACTCACGCACCTCATGTTTGCAGTCGA-5 ' Substrate 16, ED Fig. 5 3! -GACGGTATTTCTACTCTGCGACCTCATGTTTGCAGTCGA-5'' 


*sgRNA guide sequences and matching DNA target strand sequences are shown in red. PAM sites (5’-NGG-3’) are highlighted in yellow on the non-target strand. Internal mismatches in select DNA substrates 
are denoted by misaligned text on the non-target strand. 

TAIl sgRNA constructs contain remnants of the BamHI sequence on the 3’ end resulting from run-off in vitro transcription. 

tsgRNA specific to N. meningitidis (Nme) Cas9 contains an additional 3’ extension, which does not affect activity (data not shown), for purposes unrelated to this study. 

§Aguide1-—20 sgRNA contains an extraneous 5/-G from in vitro transcription. 
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Crystal structure of the RNA-dependent RNA 
polymerase from influenza C virus 


Narin Hengrung!, Kamel El Omari’, Itziar Serna Martin’, Frank T. Vreede!, Stephen Cusack, Robert P. Rambo’, 
Clemens Vonrhein®, Gérard Bricogne®, David I. Stuart?*, Jonathan M. Grimes?“ & Ervin Fodor!§ 


Negative-sense RNA viruses, such as influenza, encode large, 
multidomain RNA-dependent RNA polymerases that can both 
transcribe and replicate the viral RNA genome’. In influenza virus, 
the polymerase (FluPol) is composed of three polypeptides: PB1, 
PB2 and PA/P3. PB1 houses the polymerase active site, whereas 
PB2 and PA/P3 contain, respectively, cap-binding and endonuclease 
domains required for transcription initiation by cap-snatching”. 
Replication occurs through de novo initiation and involves a 
complementary RNA intermediate. Currently available structures 
of the influenza A and B virus polymerases include promoter 
RNA (the 5’ and 3’ termini of viral genome segments), showing 
FluPol in transcription pre-initiation states**. Here we report the 
structure of apo-FluPol from an influenza C virus, solved by X-ray 
crystallography to 3.9 A, revealing a new ‘closed’ conformation. The 
apo-FluPol forms a compact particle with PB1 at its centre, capped 
on one face by PB2 and clamped between the two globular domains 
of P3. Notably, this structure is radically different from those of 
promoter-bound FluPols**. The endonuclease domain of P3 and 
the domains within the carboxy-terminal two-thirds of PB2 are 
completely rearranged. The cap-binding site is occluded by PB2, 
resulting in a conformation that is incompatible with transcription 
initiation. Thus, our structure captures FluPol in a closed, 
transcription pre-activation state. This reveals the conformation 
of newly made apo-FluPol in an infected cell, but may also apply 
to FluPol in the context of a non-transcribing ribonucleoprotein 
complex. Comparison of the apo-FluPol structure with those of 
promoter-bound FluPols allows us to propose a mechanism for 
FluPol activation. Our study demonstrates the remarkable flexibility 
of influenza virus RNA polymerase, and aids our understanding of 
the mechanisms controlling transcription and genome replication. 

FluPol is a highly flexible protein complex; however, the conforma- 
tional states it can adopt are uncharacterized. Understanding the nature 
of these conformational states is central to determining the regula- 
tory mechanisms of this enzyme. To this end, we have determined the 
structure of FluPol from influenza C virus® (FluPolc), in the absence of 
promoter RNA. We expressed all three individual subunits of FluPolc 
in insect cells by infection with a single baculovirus construct. FluPolc 
purified from this system was active in both replication and transcrip- 
tion initiation (Extended Data Fig. 1). We crystallized apo-FluPolc in 
two different crystal forms (Extended Data Table 1), and solved its 
structure at 3.9 A (Extended Data Fig. 2) and 4.3 A resolution. 

Our model of FluPolc (Fig. 1) comprises 711 of the 754 residues of 
PB1 (94.3%), 762 out of 774 for PB2 (98.4%) and 693 out of 709 for 
P3 (97.7%). FluPolc forms a relatively compact structure (Fig. 1a, b). 
P3 folds into two domains connected by a long linker (Fig. 1c): an 
amino-terminal endonuclease domain (P3enq.) and a C-terminal 
domain (P3¢), which sandwiches PB] at the heart of the molecule. PB1 


has the canonical right-hand-like polymerase fold, possessing palm, fin- 
gers and thumb subdomains with additional N- and C-terminal exten- 
sions (PB1y-¢xt and PB1c_¢xt) that facilitate interactions with the other 
subunits (Fig. 1d). The thumb of PB] is reinforced by P3¢. The prim- 
ing loop of PB1, believed to facilitate de novo replication initiation’, 
is not visible in our structure and is probably disordered. PB2 stacks 
against one face of PB1, contacting both domains of P3. PB2 comprises 
9 domains: the N-terminal PB1 interaction domain (PB2y ter), PB2nj, 
PB2n2, PB2\iq and PB2miq domains, a cap-binding domain (PB2,ap), 
a linker domain (PB2 ap-627 linker)» the 627 domain (PB2¢27) and a 
C-terminal nuclear localization signal (NLS) (PB2y 1s) domain (Fig. le). 

The fold of each FluPolc domain is very similar to its counterpart in 
FluPol, and FluPolg, even though the sequence identity between these 
polymerases is only ~30% (Extended Data Table 2 and Supplementary 
Fig. 1). The average root mean squared deviation (1.m.s.d.) values of 
Ca atoms between equivalent superposed domains of FluPolc and 
FluPol,, or of FluPolc and FluPols: are 1.6 Aor1.5A, respectively, 
demonstrating that the FluPol fold is conserved across influenza A, B 
and C viruses. All key active site residues within FluPolc are structur- 
ally conserved, and we confirmed, by mutation, that FluPolc shares 
common mechanisms with FluPol, (Extended Data Fig. 3a). The PB1 
subunits of FluPol A, B and C belong to a structural grouping that most 
closely resembles the polymerases of Reoviridae and Cystoviridae/ 
Flaviridae (Extended Data Fig. 3b). 

However, there are substantial differences between apo-FluPolc and 
the activated structures for promoter-bound FluPol, and FluPolg. Most 
striking are the position of P3.nq. and the arrangement of the C-terminal 
domains of PB2 (Fig. 2, Supplementary Video 1 and Extended Data 
Table 3). Thus, PB2¢97, which in FluPol, houses a crucial polymorphism 
(Glu627Lys) for the determination of viral host range and pathogenic- 
ity®, lies level with the endonuclease domain in the apo structure, (Fig. 
2a), whereas in the activated structures it lies close to PB1 pam (Fig. 2b). 
The PB2miq and PB2¢ap-627 linker domains are rearranged en bloc, bya 
rotation of 140° and a translation of 30 A, between the apo and activated 
conformations. PB2 cap also changes; in the apo structure, it is tucked in 
between the PB1 aim and PB2¢ap.627 linker While in the activated structures 
it does not extensively contact other domains. Finally, PB2n1s packs 
between PB1¢ ter helix a23 and P3enqo helix a7 in apo-FluPolg (Fig. 3a), 
but is near the base of the PB1 am in the activated structures (Fig. 2b). 
The movement of PB2y;5 amounts to a rotation of 130° and a transla- 
tion of 90 A. The regions rearranged within PB2 match those that are 
disordered in the FluB2 structure’, lying immediately downstream of a 
conserved glycine (PB2 residue 255 in FluPolc). 

The buried area between PB2 and P3 (5,000 A’) is more extensive in 
the apo than the activated, promoter-bound structures, reflecting new 
contacts between PB2 and P3endo (Fig. 3a and Extended Data Fig. 4a, b). 
Additionally, the extreme C terminus of PB2 is visible in our apo 
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Figure 1 | Structure of FluPolc. a, b, Two views of the structure of the 
FluPolc heterotrimer, coloured according to subunit (PB1, orange; PB2, 
green; P3, blue). The cap-binding pocket and endonuclease active site are 
shown as blue and red spheres, respectively. In a, the PB1 catalytic aspartates, 


structure (Extended Data Fig. 4b), forming a helix (a30) that packs 
against the back face of P3 endo (near P3 helices a2, a3 and a7). There 
are also many new contacts between PB2 and PB1 (Fig. 2 and Extended 
Data Fig. 4c). 

One important consequence of the arrangement seen in apo-FluPolc 
is that the cap-binding pocket is occluded (Fig. 3b). PB2cap is folded in on 
the rest of the subunit, facing residues 520-535 from the PB2 ap-627 linker 
domain. This is consistent with the observation that promoter RNA is 
required for FluPol cap-binding and endonuclease activity”®, Thus, the 
structure we observe represents a closed, pre-activation state of FluPol 
and suggests that the viral RNA (vRNA) promoter causes the rearrange- 
ments necessary to form the activated structure. 

Alternatively, the structure of FluPolc could indicate a fundamental 
conformational difference between FluPols from different influenza 
viruses. To clarify this, we performed small angle X-ray scattering 
(SAXS) experiments with FluPolg, as these allowed us to distinguish 
between closed and activated FluPol conformations (Extended Data 
Fig. 5a). The observed scattering profiles from FluPolc were similar 
to a profile calculated from the FluPol, crystal structure, indicating 
that FluPolc can adopt the same activated conformation (Extended 
Data Fig. 5b). Thus, the change we see in the FluPolc crystal is not 
an influenza virus type difference. However, promoter RNA was not 
required for the activated conformation to be detected, indicating that 
changes between apo and promoter-bound structures need not exclu- 
sively be caused by RNA binding. This suggests that the energy barrier 
between different FluPol conformations is low. Indeed, when placed 
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residues 446 and 447, are also highlighted red. The position of PB2 residue 
649 (equivalent to PB2 residue 627 in FluPol,) is marked by a purple sphere. 
c-e, Structures of FluPolc subunits P3 (c), PB1 (d) and PB2 (e), coloured and 
labelled by domain. f, Domain maps of each FluPolc subunit. 


into a phosphate-based buffer, FluPolc adopted a currently uncharac- 
terized conformation that was even more open than that of the promot- 
er-bound structures (Extended Data Fig. 5c). These results suggest that 
FluPol may be poised between several different conformations, with 
only subtle environmental changes needed for a particular conforma- 
tion to be favoured. 

In line with this assessment, differences around the promoter- 
binding site between the apo-FluPolc and promoter-bound structures 
are small. Minor changes are evident around the pocket that binds the 
intra-base paired hook structure of the 5’ strand of the VRNA promoter 
(Extended Data Fig. 6); however, sequence alignments suggest that 
these differences are influenza-virus-type-specific. More interesting 
are the differences around the binding site for the 3’ strand of the VRNA 
promoter. In the apo structure, PB2 helix a4, PB2n; and the associated 
region of PB] hump lie ~5 A further away from the polymerase core than 
in the promoter-bound structures (Fig. 3c). This change is transmit- 
ted to the neighbouring PB1¢ ¢xt-PB2n ter interaction domain through 
PBI helix «22, resulting in a 20° rotation of this domain between the 
apo and vRNA promoter-bound conformations (Fig. 3d). Since the 
PB1c-ext-PB2n-ter domain lies next to PB2y1s in apo-FluPolc (Fig. 3a), 
this rotation could trigger the movement of PB2y15 from its apo posi- 
tion, leading to the subsequent massive reorganization of FluPol after 
vRNA-promoter binding (Supplementary Video 2). 

Notably, only one currently reported FluPol structure (FluB2) con- 
tains a fully ordered VRNA promoter (in the others, the 3’ VRNA strand 
is either truncated or partially disordered)*. However, this does not 
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Figure 2 | Comparison of apo-FluPolc with promoter-bound FluPol,. 

a, Apo-FluPolc, depicted in the same orientation and colouring as in Fig. 1b, 
but with the C-terminal domains of PB2 coloured as in Fig. le. Domains 
that do not change between apo and promoter-bound conformations are 
depicted as semi-transparent. b, Promoter-bound FluPol,, shown as in 

a. c, d, The PB2 subunits of apo-FluPolc (c) and promoter-bound FluPol, 
(d), depicted as in a and b. 


display a stable activated conformation, as the C-terminal two-thirds 
of PB2 are not resolved. We suggest that this is because initial binding 
of the VRNA promoter, into a resting position away from the active site, 
generates a dynamic equilibrium between closed and activated confor- 
mations. The activated structure is only seen when the 3’ end of the 3’ 
vRNA promoter strand is either not present or disordered**. Hence, 


the activated conformation might only be fully stabilized when this 3’ 
end is released from its resting position to enter the polymerase active 
site. To test this hypothesis, we compared the ability of a full-length 
or truncated (lacking four nucleotides at the 3’ end of the 3’ strand) 
vRNA promoter to stimulate FluPolc cap-dependent cleavage activity 
(Fig. 4). We reasoned that stabilization of an activated over a closed 
conformation would enhance capped-RNA cleavage, as the cap-binding 
pocket in PB2 becomes more accessible. In line with this, we observed 
a significant enhancement in capped RNA cleavage in the presence of 
the truncated promoter RNA (Fig. 4). The relative inefficiency of the 
full-length VRNA promoter to stimulate cleavage supports our assertion 
that initial promoter binding results in a closed/activated equilibrium. 
The mechanism behind this may involve the PB1 B-ribbon (177-212 in 
FluPol,)>, which is disordered in the apo-FluPolc structure, but adopts 
different conformations in the activated and FluB2 structures. 

In summary, we have solved the structure of the RNA polymerase 
from an influenza C virus in the absence of RNA, uncovering a closed 
conformation accessible to FluPol. Our structure explains the obser- 
vation that FluPol in the absence of promoter RNA is unable to per- 
form cap-snatching”’, and we propose a mechanism for how vVRNA 
promoter might bring about FluPol activation. However, the closed 
conformation captured here may have a wider functional relevance, 
because it could still be accessible to FluPol bound to a fully ordered 
vRNA promoter that does not enter the active site. Therefore, in the 
context of a non-transcribing viral ribonucleoprotein complex (RNP), 
containing FluPol, RNA and nucleoprotein, FluPol may well adopt this 
closed conformation. In addition, dependent on stabilization of the PB1 
priming loop, the closed conformation that we observe might still allow 
de novo initiation, as this is not dependent on cap-snatching. Thus, 
the conformation that we observe, in addition to being a transcription 
pre-activation state, could be relevant during genome replication ini- 
tiation. This would allow the activity of FluPol within an RNP to be 
regulated by other viral factors and host proteins”. 

Our work underlines the tremendous flexibility of this protein com- 
plex. This flexibility offers an explanation for the differences between 
several low-resolution electron microscopy reconstructions of RNP- 
associated FluPol, as well as explaining why the promoter-bound struc- 
tures do not fit well into these reconstructions'*~'®. Furthermore, since 


Apo-FluPol, 


Figure 3 | Critical changes between apo-FluPolc and promoter-bound 
FluPol,. a, Equivalent views of FluPolc (left) and FluPols (right), showing 
domain arrangement differences. The inset shows the arrangement of P3 endo 
within the FluPolc structure. b, Close-up of the FluPolc (left) and FluPol, 
(right) cap-binding domains. PB2 residues 520-535 in FluPolc are coloured 
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dark orange. The cap-binding pocket is shown with blue spheres. c, d, Two 
views of a superposition of apo-FluPolc and FluPol,, with FluPolc coloured 
as in Fig. 1 and FluPolg in lighter colours. 5’ and 3’ promoter RNAs in 

the FluPol, structure are coloured pink and yellow, respectively. Arrows 
highlight differences between the two conformations. 
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Figure 4 | Capped-RNA cleavage assays with FluPolc. a, Representative 
autoradiograph of a capped-RNA cleavage assay. In each reaction, 
radiolabelled capped RNA was incubated at 30°C for 2h with FluPolc and 
the indicated strands of the VRNA promoter. b, Quantification of cleavage, 
expressed as the percentage of cleaved to total RNA, from three replicates of 
this assay, performed with the same polymerase preparation. Mean cleavage 
percentage is plotted. Error bars show s.d. Asterisk indicates a significant 
difference between cleavage with full or truncated VRNA promoter (n= 3, 
P=0.0003, two-tailed t-test). 


negative-sense RNA virus polymerases share a common organization, 
with a central polymerase core surrounded by various functional modu- 
lar appendages’, the conformational flexibility revealed here might be a 
theme among all of these polymerases and not just particular to FluPol'®. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Protein expression and purification. The three subunits of the influenza C/ 
Johannesburg/1/1966 virus polymerase (PB1: AAF89738, PB2: AAF89739, P3: 
AAF89737) were co-expressed in Sf9 cells from codon-optimized genes (GeneArt) 
cloned into a single baculovirus using the MultiBac system’. Expression and puri- 
fication of FluPolc proceeded as previously described for FluPol,”’, except that the 
gel filtration buffer used was 0.5 M NaCl, 25 mM HEPES-NaOH, pH 7.5 and 10% 
(v/v) glycerol. For crystallization and storage, protein purified in this buffer was 
supplemented with 0.5 mM TCEP, and 10mM MgCl or 10mM CaCh. 
Crystallization, data collection and structure determination. Crystals of FluPolc, 
belonging to two different space groups, grew from sitting-drop vapour-diffusion 
experiments at 20°C”®!, set up using a protein:precipitant ratio of 2:1. In these 
experiments, 5mg ml! protein was mixed with either 70% (v/v) Morpheus G2 
(Molecular Dimensions), supplemented with 0%-1% 1 M NaOH, to generate 
P432,2 crystals; or with crystal-seeds and 0.2 M NaCl, 0.1 M Na-HEPES, pH 7.5 
and 25% (w/v) PEG 4000, for P2122, crystals. For heavy atom derivatization, 
P432,2 crystals were soaked in a solution of gold(1) potassium cyanide dissolved 
in mother liquor, for 2-3 h at 20°C. Crystals were cryo-protected using 25% (v/v) 
glycerol in crystallization buffer, before flash-cooling in liquid nitrogen, and data 
collection on beamlines 103 and 104 at the Diamond Light Source, Didcot, UK. 
The beam size was matched to the crystal size and data were collected on a Pilatus 
6M detector at a wavelength of 0.9763 A (tetragonal native), 1.0350 A (tetragonal 
derivative) and 0.9795 A (orthorhombic native). Data collection statistics are shown 
in Extended Data Table 1. Data were processed using Xia2 (ref. 22) and HKL2000 
(ref. 23). Initial phases were obtained by single isomorphous replacement with 
anomalous scattering (SIRAS), using data from native P432)2 and gold-derivatized 
crystals. The P432)2 data used at this stage was collected earlier (at a wavelength of 
0.8634 A) than that subsequently used in refinement. Heavy atoms were located 
with SHELX™ and phases improved by two-fold non-crystallographic averaging 
(the crystallographic asymmetric unit contained two heterotrimers) and solvent 
flattening (solvent content 76%) using Phenix.autosol”°. The tetragonal and the 
orthorhombic data were sharpened to 40 A? and 36 A’, respectively. The P2,22; 
crystals were solved by molecular replacement (program Phaser”), using the 
P432)2 structure as the search model. As expected the orthorhombic crystals 
possessed four heterotrimers in the crystallographic asymmetric unit, allowing 
phase improvement using non-crystallographic symmetry (NCS) averaging and 
solvent flattening using general averaging program (GAP) (D.LS. and J.M.G., 
unpublished observations). The published fragments of FluPol, (PDB accessions 
4IUJ, 4AWH, 4CB4, 3A1G and 2VY7) were fitted by eye using Coot, which was 
used for all model building””. Comparison with the complete FluPol, and FluPolg 
structures (4WSB and 4WSA, respectively), aided by the anomalous scattering 
from the sulphur atoms as markers, allowed us to build and refine complete mod- 
els for FluPolc. This provided a total of six independent views of the polymer- 
ase. Performing superpositions of these demonstrated that the molecule adopts a 
virtually identical conformation across all copies from both crystal forms (mean 
pairwise r.m.s.d. in Ca was 0.94 A between all pairs of molecules across both space 
groups). Refinement (Extended Data Table 1) used BUSTER” aided by NCS and 
initially phase restraints, and REFMAC” with secondary structure restraints using 
PROSMART”. 

SAXS experiments. SAXS measurements were performed on beamline B21 at 
Diamond Light Source, Didcot, UK. Samples were prepared onsite using a Shodex 
Kw-403 size exclusion column and Agilent HPLC. Approximately 40-60 ul of sam- 
ple were collected for SAXS at 20°C using a sample to detector distance of 3.9m 
and X-ray wavelength of 1 A. Samples were exposed for 300s in 10s acquisition 
blocks. Images were corrected for variations in beam current, normalized for expo- 
sure time and processed into 1D scattering curves using GDA and DAWN. Buffer 
subtractions and all other subsequent analysis were performed with the program 
ScAtter (http://www.bioisis.net/scatter). Samples were checked for radiation dam- 
age by visual inspection of the Guinier region as a function of exposure time. 
Data analysis. Figures and videos were prepared using PyMOL (http://www. 
pymol.org) and Chimera*’. Structural comparisons used SHP*. 

Polymerase activity assays. For the cap-dependent cleavage and transcription 
assays, FluPolc (400 ng per reaction) was incubated for 2h at 30°C with or with- 
out (as indicated) NTPs (1 mM ATP, 0.5mM each CTP/UTP/GTP), radiolabelled 
capped 20-nucleotide or 11-nucleotide RNAs, 0.6 1M each 5’ and 3’ vRNA pro- 
moter strands, in a reaction buffer containing 7.5 mM MgCh, 1.0mM TCEP, 2U 
ul RNasin (Promega), 20 mM HEPES-NaOH, pH 7.5, 100mM NaCl and 5% (v/v) 
glycerol. For the de novo initiation and elongation assays, FluPolc (400 or 800 ng per 
reaction, as indicated) was incubated for 2-3 h at 30°C with 2.5 mM adenosine and 
0.075 uM [a-?2P]GTP or 1 mM ATP, 0.5mM each CTP/UTP, 0.1 mM GTP, 0.3uM 
[a-?P]GTP and (as indicated) 0.6 uM each 5’ or 3’ VRNA promoter strands, in 
the same reaction buffer as above. The reaction volume was 4 tl for all reactions. 


Products were denatured by boiling (98°C, 5 min) after the addition of formamide 
(41) and separated on a denaturing 20% polyacrylamide gel, with the indicated size 
markers. Products were visualized by autoradiography. For all activity assays except 
the cleavage assays, the sequences of the promoter RNA oligonucleotides used were: 
5’-AGCAGUAGCAAGGAG-3! (5/ vRNA) and 5’-CUCCUGCUUCUGCU-3’ 
(3’ VRNA). The sequences of the RNAs used in the capped-RNA cleavage assays 
were 5’-AGCAGUAGCAAGGGG-3’ (5’), 5/-UAUACCCCUGCUUC-3’ (3’ trun- 
cated) or 5‘- UAUACCCCUGCUUCUGCU-3’ (3’ full length). 

Capped and radiolabelled RNA was produced by incubating 5’ diphosphate 
synthetic 20-nucleotide (5’-ppAAUCUAUAAUAGCAUUAUCC-3’)*# or 11- 
nucleotide (5!-ppGAAUACUCAAG-3’)*34 RNA (Chemgenes), with [a-’P]GTP, 
vaccinia virus capping enzyme (NEB) and 2/-O-methyltransferase (NEB), follow- 
ing the manufacturer's instructions. The resulting RNAs were gel purified before 
use in the above assays. 
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Extended Data Figure 1 | Purification and characterization of FluPolc. 

a, Elution profile of FluPolc, after affinity purification over IgG-sepharose, 
from a size-exclusion chromatography column. Eluted protein was detected 
by measuring the absorbance at 280 nm. b, Fractions corresponding to the 
major peak eluting from the size-exclusion chromatography column were 
mixed and analysed by SDS-PAGE on a 15% polyacrylamide gel, alongside 
the indicated molecular mass markers. Protein was visualized by Coomassie 
blue staining (PB1, 86.0 kDa; PB2, 87.2 kDa; P3, 81.9kDa). c, Transcription 
and replication initiation assays. Lanes 2-6 test for transcription initiation. 
With the addition of VRNA promoter only (lanes 4 and 5), FluPolc can 
cleave a capped and radiolabelled 20-nucleotide RNA, demonstrating 
promoter-dependent endonuclease activity. Lane 6 shows that with the 
addition of NTPs, this capped primer can be extended to produce a capped 
transcript, thus demonstrating transcription initiation activity. This 

result is confirmed by lanes 7-9, which test for extension of a capped and 


radiolabelled 11-nucleotide RNA primer. Extension only takes place when 
the polymerase is supplied with NTPs and promoter RNA (lane 9). Lanes 
10-12 assay for replication initiation. Lane 12 shows that FluPolc (400 ng per 
reaction) is able to synthesize ApG dinucleotide in a primer-independent 
manner. This demonstrates de novo replication initiation activity. Uncapped 
20-nucleotide and 11-nucleotide primers are used as size markers in lane 1. 
The slow migration of the ApG dinucleotide compared to the markers is due 
to the lack of phosphate groups on the 5’ end of this product*®. d, De novo 
initiation and elongation assay. FluPolc (800 ng) was incubated for 3 h with 
NTPs, [a-**P]GTP and 5’ or 3’ vRNA promoter strands, as indicated. In 

the presence of both promoter strands (lane 4), FluPolc is able to produce a 
full-length copy of the template (14 nucleotides, corresponding to the major 
band), demonstrating de novo replication initiation and elongation activity. 
The minor slower and faster bands may correspond to non-templated 
extension and premature termination products, respectively. 
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Extended Data Figure 2 | Data and model quality. Plots of CC*, CCfreeand of 0.87 for the highest resolution shell indicates that these data contain use- 
CCwork against resolution, for the tetragonal crystal data set and model. The ful information up to 3.9 A. CChree and CCwork are lower than CC*, showing 
CC* statistic assesses the signal present in the data, while CCy,.. and CCwork that the model does not overfit the data. 

provide an estimate of the agreement between data and model. A CC* value 
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Extended Data Figure 3 | Functional and structural relationships of 
FluPolc. a, Effect of amino acid mutations in FluPolc on transcription 

and replication. Plasmids to express FluPolc subunits and NP along 

with a plasmid expressing a negative-sense CAT reporter gene flanked 

by the terminal non-coding sequences of the influenza C virus NS 

gene segment” were transfected into 293T cells (ATCC). Total RNA 

was isolated using Trizol (Invitrogen) 30h post transfection and viral 
RNAs were analysed by primer extension*’ using the following primers: 
5'-CGCAAGGCGACAAGGTGCTGA-3’ (for detection of vRNA, yielding 
a 127-nucleotide product) and 5’-ATGTTCTTTACGATGCGATTGGG-3’ 
(for detection of mRNA and complementary RNA (cRNA), yielding 
98-102-nucleotide and 89-nucleotide products, respectively). Primer 
extension products were analysed by 6% PAGE. Quantification of primer 
extension analysis using phosphorimaging is shown below. The mean 

and s.d. of three experiments are shown. Asterisks indicate a significant 
difference from wild type (WT), which was set to 100% (*P < 0.05; 

** P< 0.01, based on a two-sample t-test). A double mutation of two 


RNA Polymerases 


aspartic acids Asp446/Asp447, that align with Asp445/Asp446 of influenza 
A virus PB1 found to be critical for activity*’, resulted in no detectable 
activity in the context of FluPolc. Mutation of amino acid residues in the 
PB2 cap-binding and P3 endonuclease domains that align with critical 
amino acid residues in FluPol,*?~*! resulted in undetectable accumulation 
of viral mRNA although most of these mutants were still able to replicate. 
The exception was Phe416Ala in PB2 that inhibited both transcription and 
replication, suggesting that this mutation might not only affect cap-binding 
but overall PB2 folding, in agreement with previously observed inhibitory 
effects of mutations in the cap-binding domain on replication**. The 
requirement of these amino acid residues for mRNA synthesis is consistent 
with the hypothesis that FluPolc generates capped RNA primers for 
transcription initiation in a manner similar to that of FluPol,. b, Structure- 
based phylogenetic tree showing the relationship of PB1 from FluPolc 

(C PB1) to other right-handed polymerases. Pairwise comparisons were 
performed using SHP* and a phylogenetic tree constructed using PHYLIP. 
The branches are identified by the PDB accession code of the polymerase. 
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Extended Data Figure 4 | New subunit interfaces in apo-FluPolc. a,b, Two _ In all panels, residues at the interface were calculated using the ‘Protein 


views of the interaction interface between PB2nrs (green) and P3endo (blue). interfaces, surfaces and assemblies’ service PISA at the European 
Predicted polar contacts between the subunits are shown as dotted yellow Bioinformatics Institute (http://www.ebi.ac.uk/pdbe/prot_int/pistart. 
lines. c, The interface between PB2-ap (green) and PB1paim (orange). html)”. 
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Extended Data Figure 5 | SAXS analysis of FluPolc. a, Calculated solution- 
state SAXS profiles for the crystal structures of VRNA-FluPol,? (activated 
conformation) and apo-FluPolc (closed conformation). A distinguishing 
difference between these profiles is the dip at q~ 0.1 A~! in the FluPol¢ 
curve. b, Solution-state SAXS profile of FluPolc, without added promoter 
RNA, overlaid with the calculated curve for the VRNA-FluPol, structure’. 
The good match between these curves suggests that in this particular buffer 
(0.5 M NaCl, 25 mM HEPES-NaOH, pH 7.5, 5% (v/v) glycerol), FluPolc 
adopts the same globular conformation as the RNA bound state. 
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c, Dimensionless Kratky plot of apo-FluPolc in the presence of 0.5 M NaCl, 
25mM HEPES-NaOH, pH 7.5 and 5% (v/v) glycerol (blue) or 100 mM KCI, 
2% (w/v) sucrose and 100 mM sodium phosphate, pH 7.3, with (magenta) 
or without (grey) 200 mM proline. Cross-hairs denote the Guinier-Kratky 
point (1.732, 1.104), the peak position for an ideal, globular particle. As 
indicated by the upward shift of the peaks in the dimensionless Kratky plot, 
FluPolc is less globular in the presence of phosphate than it is in the 0.5 M 
NaCl buffer. This effect can be lessened if proline is also present, potentially 
owing to increased molecular crowding. 
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Extended Data Figure 6 | Differences at the 5’ vRNA promoter binding site. Superposition of apo-FluPolc (darker colours) and FluPol, (lighter colours) 
structures, with sites of interest labelled. 
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Extended Data Table 1 | Data collection, phasing and refinement statistics 


Data collection 
Space group 
Cell dimensions 

a, b,c (A) 

a, B,y () 
Resolution (A) ** 
Rsym OF Rmerge 
I/ol 
Completeness (%) 
Redundancy 


CCip2 38 a 2K 


Refinement 

Resolution (A) 

No. reflections 

Ryork/ Ritee 

No. atoms 
Protein 

Wilson B-factors (A”) 
Protein 

R.m.s deviations 
Bond lengths (A) 


Bond angles (°) 


Tetragonal Native* 


P4322 


185.66, 185.66, 598.22 
90.00, 90.00, 90.00 
100.6 — 3.9 (4.0 — 3.9) 
0.196 (3.586) 

13.2 (1.3) 

98.8 (96.9) 

21.2 (20.6) 


(0.621) 


50.0 — 3.9 


90,335 


0.286/0.326 


34,720 


205 


0.008 


L112 


Tetragonal Derivative* 


P4322 


184.22, 184.22, 598.75 
90.00, 90.00, 90.00 
127.3- 6.9 (7.1- 6.9) 
0.157 (0.665) 

16.5 (1.8) 

99.1 (89.0) 

14.5 (3.7) 


(0.612) 


LETTER 


Orthorhombic Native* 


P2,2;2; 


107.28, 217.50, 597.75 
90.00, 90.00, 90.00 
80.9- 4.3 (4.4 — 4.3) 
0.204 (0.993) 

5.5 (1.1) 

99.0 (93.1) 

3.3 (2.9) 


(0.500) 


50.0 — 4.3 
90,390 
0.316/0.368 


69,371 


190 


*The native data sets were each collected from a single crystal, whereas the derivative data set was produced by merging data from two crystals. 
*Highest resolution shell is shown in parentheses. 


**As described in ref. 44. 
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Extended Data Table 2 | Sequence identities between subunits of FluPol from C/Johannesburg/1/1966 and those from A/Little 
yellow-shouldered bat/Guatemala/060/2010 or B/Memphis/13/2003, calculated using EMBOSS Stretcher*® 


FluPol Sequence Identity with C (%) 
Subunit B 

PBI 40.8 

PB2 25.2 

P3/PA 25.6 
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Extended Data Table 3 | Domain position differences between 
apo-FluPolC and promoter-bound FluPolA, calculated using SHP 


Domain Rotation (°°) Distance (A) 
PA Endo/ P3 Endo 140 19 
PB2mia and PB2cap-627 Linker 141 29 
PB2cap 122 30 
PB2627 163 79 
PB2nis 134 93 
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Structural insight into substrate preference for 


TET-mediated oxidation 


Lulu Hu!?3*, Junyan Lu**, Jingdong Cheng!?*, Qinhui Rao!?, Ze Li+*, Haifeng Hou®, Zhiyong Lou®’, Lei Zhang!*, Wei Lil, 
Wei Gong!”, Mengjie Liu!’?, Chang Sun!, Xiaotong Vin! , Jie Li’, Xiangshi Tan!, Pengcheng Wang®, Yinsheng Wang®, 
Dong Fang’, Qiang Cui’, Pengyuan Yang!*, Chuan He!."", Hualiang Jiang*, Cheng Luo* & Yanhui Xu!?? 


DNA methylation is an important epigenetic modification’. 
Ten-eleven translocation (TET) proteins are involved in DNA 
demethylation through iteratively oxidizing 5-methylcytosine (5mC) 
into 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 
5-carboxylcytosine (5caC)**. Here we show that human TET and 
TET2 are more active on 5mC-DNA than 5hmC/5fC-DNA substrates. 
We determine the crystal structures of TET2-5hmC-DNA and TET2- 
5fC-DNA complexes at 1.80 A and 1.97 A resolution, respectively. 
The cytosine portion of 5hmC/5fC is specifically recognized by 
TET2 in a manner similar to that of 5mC in the TET2-5mC-DNA 
structure’, and the pyrimidine base of 5mC/S5hmC/5fC adopts an 
almost identical conformation within the catalytic cavity. However, 
the hydroxyl group of 5hmC and carbonyl group of 5fC face towards 
the opposite direction because the hydroxymethyl group of 54mC 
and formyl group of 5£C adopt restrained conformations through 
forming hydrogen bonds with the 1-carboxylate of NOG and N4 
exocyclic nitrogen of cytosine, respectively. Biochemical analyses 
indicate that the substrate preference of TET2 results from the 
different efficiencies of hydrogen abstraction in TET2-mediated 
oxidation. The restrained conformation of 5hmC and 5fC within the 
catalytic cavity may prevent their abstractable hydrogen(s) adopting 
a favourable orientation for hydrogen abstraction and thus result in 
low catalytic efficiency. Our studies demonstrate that the substrate 
preference of TET2 results from the intrinsic value of its substrates 
at their 5mC derivative groups and suggest that 5hmC is relatively 
stable and less prone to further oxidation by TET proteins. Therefore, 
TET proteins are evolutionarily tuned to be less reactive towards 
5hmC and facilitate the generation of 5hmC as a potentially stable 
mark for regulatory functions. 

Previous studies have shown that 5hmC is 10- to 100-fold more 
abundant than 5fC/5caC and that its level is relatively high in neurons, 
self-renewing and pluripotent stem cells, and greatly reduced in cancer 
cells®”!°"!3. The depletion of Tdg in mouse embryonic stem cells leads 
to an accumulation of 5fC and 5caC by two- to ten fold, but no apparent 
changes in 5hmC and 5mC levels, suggesting that thymine-DNA gly- 
cosylase is not predominately responsible for the different abundance 
of 5hmC and 5fC/5caC™. In vitro biochemical analyses indicate that 
mouse Tet2 and Naegleria Tet-like protein possess higher activity for 
5mC than for 5hmC/5fC®!, suggesting that TET enzymes might play 
a major role in controlling the level of 5mC-oxidized derivatives. 

To understand how TET proteins recognize 5hmC/5fC and iter- 
atively oxidize 5mC and its derivatives, we performed an in vitro 
enzymatic activity assay using purified recombinant catalytic domain 


of human TET1 or TET2 with the products detected by liquid- 
chromatography-tandem mass spectrometry (LC-MS/MS) (Extended 
Data Fig. 1 and Supplementary Tables 1 and 2). TET1 (12.5 uM) con- 
verted 89% 5mC to 47% 5hmC, 19% 5fC and 23% 5caC for 5nC-DNA 
substrate (Fig. 1a). In contrast, it could only oxidize 25% ShmC or 18% 
5£C. TET2 (5M) oxidized almost all 5nC-DNA, but 52% 5hmC-DNA 
or 33% 5fC-DNA (Fig. 1b). In low protein concentration (1 uM), TET2 
could still oxidize over 90% 5mC, but only 15% 5hmC or negligible 5fC 
(Fig. 1c). Thus, human TET1 and TET2 both showed higher activity 
on 5mC-DNA than on 5hmC/5fC-DNA substrates, which is consistent 
with previous observations and suggests a conserved mechanism for 
TET proteins®. 

We next detected product generation at different time points (Fig. 1d 
and Supplementary Table 3). For 5mC-DNA substrate, TET2 (1 uM) 
converted 73% 5mC to 70% 5hmC and less than 3% 5fC at 5 min. A 
noticeable amount of 5fC only emerged at 10 min when 5hmC rela- 
tively accumulated (10.5 uM, ~68% of 5mC substrate), and detectable 
5caC emerged at 20 min when 5fC accumulated (3.5 uM, ~23% of 5mC 
substrate). For 54mC-DNA substrate, the level of 5fC was low at 5 min 
after initiation of the oxidation, whereas 5caC only emerged at 20 min 
when 5fC accumulated (1.2 1M, ~10% of 5hmC substrate) (Fig. le). No 
5caC was detected when 5fC-DNA was used as the substrate (Fig. 1f). 
The results indicate that 5hmC was the major product of TET-mediated 
5mC oxidation under our experimental conditions and considerable 
amounts of 5fC/5caC are not generated until 5hmC accumulates, which 
is consistent with the observation that cellular 5hmC is relatively stable 
and significantly more prevalent than 5fC/5caC®"*. 

We next performed steady-state kinetic analyses for the TET2- 
mediated oxidation of 5mC/5hmC/5fC-DNA substrates (Fig. 1g-i 
and Supplementary Table 4). We optimized TET2 concentration 
to ensure only one product was generated for each measurement 
(Extended Data Fig. 1c—h). For example, 0.5 uM TET2 converted 
5mC-DNA to 5hmC-DNA, but not 5fC/5caC-DNA, under the 
experimental conditions. The kinetic analyses indicate that TET2 
has higher activity for 5mC-DNA (Keai/Km= 4.42 x 10°>M~! s~1) 
than for 5hmC-DNA (K-a¢/Km = 0.70 X 103M~! s~') or 5f{C-DNA 
(Keat/ Km = 0.35 X 107>M~! s~}). 

A similar substrate preference was observed for 5mC/5hmC/ 
5fC-DNA substrates with different sequences (AT-rich/CG-rich) 
and lengths (26/58/100 base pairs (bp)) (Extended Data Fig. 2a-e 
and Supplementary Table 5). The presence of CpG-DNA or differ- 
ent products did not significantly inhibit TET2 activity, indicating 
that the substrate preference does not result from product inhibition 
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Figure 1 | TET proteins prefer 5mC-DNA, but not 5hmC/5fC-DNA as 
substrate. a~c, LC-MS/MS analyses of nucleoside hydrolytes for enzymatic 
assays using various DNA substrates and purified TET1 (a) or TET2 in 

two different concentrations (b, c). d-f, Enzymatic activities of TET2 on 
5mC-DNA (d), 5hmC-DNA (e) or 5fC-DNA (f) at different time points. 
g-i, Michaelis-Menten plots of the steady-state kinetics for TET2-mediated 
oxidation of 5mC/5hmC/5fC-DNA. To match the linear interval in the 


(Extended Data Fig. 2f-h and Supplementary Table 6). Taken 
together, TET 1/2 has higher enzymatic activity for 5mC-DNA than 
for 54mC/5fC-DNA substrates, suggesting an intrinsic property and 
conserved mechanism for TET-mediated oxidation. 

To investigate the mechanism for substrate preference of TET pro- 
teins, we first measured the DNA-binding affinities, which may affect 
activity of TET2 on various substrates. Fluorescence polarization meas- 
urements indicated that TET2 has comparable DNA-binding affinity 
to C/5mC/5hmC/5fC-DNA (Fig. 2a). The presence of Fe?*/Mn?* and 
NOG/2-OG did not affect the DNA-binding affinity (Extended Data 
Fig. 3 and Extended Data Table 1a). Surface plasmon resonance (SPR) 
measurements showed no significant difference in the association or 
dissociation constant for the dynamic interaction between TET2 and 
5mC/5hmC/5fC-DNA substrates (Fig. 2b-d and Extended Data Table 
1b). Thus, TET2 binds to 5mC/5hmC/5fC-DNA with comparable 
DNA-binding affinity, which is unlikely to result in a substrate pref- 
erence of TET2. 

To investigate whether 5hmC/5fC adopts a non-optional confor- 
mation or multiple conformations (some catalytically productive and 
others not) within the catalytic cavity to hamper TET2-mediated oxi- 
dation, we determined the crystal structures of TET2-ShmC-DNA and 
TET2-5fC-DNA at 1.80 A and 1.97 A resolution, respectively (Fig. 3a 
and Extended Data Table 2). The two complexes adopt similar overall 
fold to that of TET2-5mC-DNA (Protein Data Bank (PDB) accession 
number 4NM6)’ (Extended Data Fig. 4). Briefly, the Cys-rich domain 
and double-stranded 6-helix (DSBH) domain together form a compact 
globular fold, with the catalytic DSBH core located in the centre and 
two highly conserved loops stabilizing the DNA right above the DSBH 
core (Fig. 3b). 

TET2 binds to 54mC-DNA and 5fC-DNA through extensive hydro- 
gen bonds and hydrophobic interactions (Fig. 3c and Extended Data 
Fig. 5). TET2 recognizes 5hmC-DNA in a manner similar to that of 
5mC-DNA in TET2-5mC-DNA complex. Higher-resolution and 


first 2.5 min and generate only one product for all the reactions, 0.5 uM, 
2uM and 3M TET2 were used for the three measurements with 5mC-, 
5hmC- and 5fC-DNA, respectively. The oxidized products were analysed by 
LC-MS/MS. Relative amounts of 5mC, 5hmC, 5fC and 5caC were calculated 
for each measurement according to standard curves of various cytosine 
derivatives. Error bars, s.d. for triplicate/duplicate experiments from three/ 
two independent assays for a-f and g-i, respectively. 


clearer electron density maps provide additional insight into the 
mechanism for substrate recognition by TET2. The guanine:hydrox- 
ymethylcytosine (G7:hmC7’) base pair of DNA forms a base-stacking 
interaction with residue Y 1294 of TET2, and the hydroxymethyl group 
of hmC7’ is not recognized by TET2. In addition, the endocyclic oxy- 
gen atom O2 of hmC7’ forms water-mediated hydrogen bonds with 
residues Y1295 and R1302 of TET2 (Extended Data Fig. 5). In the 
5fC-DNA structure, because hemi-formylated double-stranded DNA 
(dsDNA) was used for crystallization, it is C7’ that pairs with G7 of the 
CpG dinucleotide outside the catalytic cavity. Nevertheless, the G7:C7’ 
base pair of 5fC-DNA is stabilized by TET2 in a similar fashion to that 
observed in TET2-5mC/5hmC-DNA complex structures (Extended 
Data Fig. 5). 

Except for CpG dinucleotide, no base of DNA is specifically rec- 
ognized by TET2, suggesting that TET2 has no preference for DNA 
sequence apart from the CpG dinucleotide (Extended Data Fig. 5). 
Consistently, TET2 shows comparable enzymatic activity on DNA 
containing AT- or CG-rich sequences flanking the methyl-CpG dinu- 
cleotide (Extended Data Fig. 2 and Supplementary Table 5). The result 
agrees well with the genome-wide analyses, in which 5hmC/5fC/5caC 
occurs mainly on the CpG site but has no preference on its flanking 
sequence!41617, 

The hydroxymethylcytosine or the formylcytosine is flipped out 
of the DNA double helix and inserted into the catalytic cavity. As 
observed in the TET2-5mC-DNA structure, the cytosine potion of 
5hmC/5fC forms hydrogen bonds with residues H1904 and N1387 of 
TET2, and the interaction is further supported by base-stacking inter- 
action between residue Y1902 and the pyrimidine base of 5hmC/5fC 
(Fig. 3d—-h and Extended Data Fig. 4e-g). Additional hydrogen bonds 
were observed in the TET2-5hmC-DNA complex, including one 
between residue H1386 and endocyclic oxygen atom O2 of base 5hmC, 
and a water-mediated hydrogen bond between exocyclic amino (N4) 
nitrogen and residue T1393 of TET2 (Fig. 3d). In the TET2-5fC-DNA 
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Figure 2 | DNA-binding affinity of TET2. a, The DNA-binding affinity 

of TET2 measured by fluorescence polarization. Fluorescein amidite 
(FAM)-labelled 18-bp C/5mC/5hmC/5fC-DNA (5 nM) was incubated with 
increasing amounts of TET2. Error bars, s.d. for triplicate experiments from 


complex, residue H1386 flips away from 5fC and has no direct con- 
tact with DNA substrate. A relative weaker water-mediated hydrogen 
bond (3.27 A) is formed between base 5fC and residue T1393 of TET2 
(Fig. 3e). 

The pyrimidine bases of 5nC/5hmC/5fC adopt almost identical 
conformation within the catalytic cavity in the three compared com- 
plexes (Fig. 3f-h). The network of TET2-DNA interactions (within and 
outside the catalytic cavity) collectively offers the specific recognition 
of 5nC/5hmC/5fC-pG dinucleotide by TET2. Notably, the cytosine 
portion of 5hmC/5fC within the catalytic cavity adopts an almost 
identical conformation to that of 5mC in the TET2-5mC-DNA struc- 
ture (Extended Data Fig. 4d), suggesting that substrate recognition is 
unlikely to result in substrate preference of TET2. 

Structural comparison of 5mC/5hmC/5fC-DNA-TET2 indicates that 
the major difference exists in their 5mC derivative groups. Notably, the 
hydrophobic methyl group of 5mC directly points to the catalytic centre 
and has no contact with NOG or TET2 residues (Fig. 3f). In contrast, 
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three independent experiments. b-d, SPR measurements of the interaction 
between TET2 and biotinylated 26-bp DNA. The data were fitted with a 
two-state binding model with binding affinity (Kq) indicated. RU, resonance 
units. 


the hydroxymethyl group of 5hmC adopts restrained conformation 
through forming a hydrogen bond (~2.6 A) with 1-carboxylate of NOG 
(Fig. 3g), whereas the formyl group of 5fC is restrained by an intramo- 
lecular hydrogen bond formed between the carbonyl group and the N4 
exocyclic nitrogen of cytosine (Fig. 3h). Asa result, the hydroxyl group 
of 5hmC and the carbonyl group of 5fC face towards opposite direction 
in the catalytic cavity when the two structures are superimposed. The 
conformational difference of 5mC/5hmC/5fC in these pre-catalysis 
complexes results from the intrinsic properties of their 5mC derivative 
groups. The above analyses suggest that such different intrinsic prop- 
erties may also lead to distinct behaviour of 5mC/5hmC/5fC during 
the catalysis, and thus result in distinct efficiency for TET2-mediated 
oxidation. 

Previous studies have proposed a consensus mechanism for 2-OG/ 
Fe(II)-dependent dioxygenases, which mainly involves four steps of 
reaction (Fig. 4a). It has been proposed that hydrogen abstraction is 
the rate-controlling step for oxidation mediated by AlkB'8, which is 


Figure 3 | Structure of TET2-5hmC-DNA complex. a, Colour-coded 
domain structure of the human TET2 catalytic domain and the sequences 
of 12-bp fully hydroxymethylated-DNA and hemi-formylated-DNA for 
crystallization. b, Ribbon representation of TET2-5hmC-DNA. NOG and 
the bases of DNA are shown in stick representations. An iron and three zinc 
cations are shown as red and grey balls, respectively. c, Interactions between 
TET2 and 5hmC-DNA with critical bases of DNA and residues of TET2 

are shown in stick representations. d, e, Close-up views for the recognition 
of 5hmC (d) and 5fC (e) by TET2. The 2F observed — Fealculated Simulated 
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annealing omit maps for residues involving 5hmC or 5fC recognition are 
shown. The omit maps for 5hmC and 5fC are indicated for clarity. The maps 
were calculated at 1.8 A (TET2-5hmC-DNA) and 1.97 A (TET2-5fC-DNA) 
respectively, and contoured at 1.00. Note that all critical groups, including 
the hydroxyl group of 5hmC and carbonyl group of 5fC, are well covered 

by the map, indicating that the structure models were built correctly. 

f-h, Close-up views for the recognition of 5mC (f, PDB 4NM6), 5hmC 

(g) and 5fC (h) by the catalytic cavity of TET2, which is shown in surface 
representation. The hydrogen bonds are indicated as dashed lines. 
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Figure 4 | Mechanism for substrate preference for TET-mediated 
oxidation. a, Model for the oxidative reactions catalysed by TET proteins. 
b, Decreased enzymatic activities of TET2 by substrate deuteration. The 
assays were performed as in Fig. 1 using 5mC-DNA (['H];5mC-DNA) 
and 5mC-DNA ([*H];5mC-DNA) as substrates. To avoid 7H exchange in 
aqueous phase, the reactions were performed using minimized enzyme 
and reaction times to prevent 5fC/5caC generation. Error bars, s.d. for 
triplicate experiments from three independent assays. c, The comparative 
kinetic traces from the reactions of TET2-Fe(II)-2-OG in the presence of 
5mC/5hmC/5fC-DNA. The formation and decay of catalytic intermediate 
for each reaction was monitored by stopped-flow absorption at 318 nm. 
d, Hydrogens of 5mC derivatives are indicated in the structures of 


structurally and mechanically similar to TET proteins”!*°. To test 
whether hydrogen abstraction is the rate-controlling step of TET2- 
mediated substrate oxidation, we measured the enzymatic activity of 
TET2 using regular 5mC-DNA ([!H];5mC-DNA) and deuterated- 
5mC-DNA ((?H]35mC-DNA) in which all of the hydrogen atoms of 
the methyl group were replaced by deuterium. The introduction of 
deuterium at the reactive position of 5mC-DNA substrate leads to a sig- 
nificant decrease in the enzymatic activity (Fig. 4b and Supplementary 
Table 7), indicating an obvious kinetic isotope effect. Notably, 0.25 uM 
TET2 oxidized ~10% of [!H],;5mC-DNA into ['H]35hmC-DNA but 
showed undetectable activity towards [7H]35mC-DNA. When treated 
with 0.5 uM TET2, 54% of [‘H];5mC-DNA was converted into 
[(‘H]35hmC-DNA whereas only 11% of [7H];5mC-DNA was oxidized. 
The result is consistent with previous studies of taurine a-ketoglutarate 
dioxygenase (tauD)*! and supports the idea that hydrogen abstraction 
is the key step for TET-mediated oxidation of 5mC. 

In the hydrogen abstraction step, a ferryl-oxo (Fe(IV) = O) inter- 
mediate (positive feature at 318 nm) is formed upon decarboxylation 
of 2-OG and oxidation is initiated (decay of the ferryl-oxo interme- 
diate) by abstraction of a hydrogen atom from the target carbon of 
the substrate”®?! (Fig. 4a). To test whether hydrogen abstraction 
contributes to the substrate preference of TET2, we measured the 
formation and decay of the ferryl-oxo intermediate for 5mC/5hmC/ 
5fC-DNA substrates using stopped-flow spectroscopy, assuming the 
318nm species represents a catalytically valid intermediate accord- 
ing to previous studies”!. The result shows different kinetic processes 
for TET-mediated oxidation of 5mC/5hmC/5fC-DNA. Comparison 
of the 318 nm kinetic traces indicates that the amplitude is signifi- 
cantly greater (more ferryl-oxo intermediate accumulation) for the 
reactions using 5hmC/5fC-DNA than for those using 5nC-DNA 
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TET2-5mC/5fC-DNA complexes (pre-catalysis form). The closest distance 
between the abstractable hydrogen and Fe(II) is indicated as a dashed line. 

e, The occupancy (percentage) of the hydrogen bond during 10 ns molecular 
dynamic simulation of free 5fC. The intramolecular hydrogen bond 

formed in 5fC is indicated as a yellow dashed line. f, Comparison of the free 
energies of 5hmC in different conformations suggests that 5hmC also has 
the tendency to form an intramolecular hydrogen bond. The occupancy 
(percentage) of the intramolecular hydrogen bond is shown as in e. Note 
that the conformation of one 5hmC (bottom) is similar to that of 5fC in 
TET2-5fC-DNA, in which the abstractable hydrogen is away from the Fe(II) 
if the low energy conformation of 5hmC is positioned into the active site. 


(Fig. 4c and Extended Data Fig. 6). This feature persisted much longer 
(slower decay of the ferryl-oxo intermediate) for the reactions using 
5hmC/5fC-DNA than those using 5mC-DNA. For all the three reac- 
tions, decay but not formation of the ferryl-oxo intermediate takes 
much longer, supporting the hypothesis that the hydrogen abstraction 
after formation of ferryl-oxo intermediate accounts for the substrate 
preference of TET2. 

Previous experimental and computational studies have suggested 
that higher homolytic C-H bond dissociation energy (BDE) of sub- 
strates would lead to lower hydrogen abstraction efficiency*””*. We 
therefore performed the calculations and found that the C-H BDEs 
for the 5-substitution group of 5mC, 5hmC and 5fC do not strictly 
follow the order of abstraction efficiencies (Fig. 4c and Extended Data 
Table 3). The C-H BDE for the formyl group of 5fC is slightly higher 
(~1 kcal mol™') than that of 5mC, but the C-H BDE for the hydroxym- 
ethyl group of 5hmC is the lowest, suggesting that other factors may 
influence the abstraction efficiency. 

Structural analyses indicate that the abstractable hydrogen of 5fC 
is relatively far away (4.98 A) from the iron because of its intramo- 
lecular hydrogen bond”! (Figs 3h and 4d). Such planar conformation 
may prevent the abstractable hydrogen adopting a favourable orien- 
tation for abstraction reaction and thus result in low catalytic effi- 
ciency. In contrast, the abstractable hydrogens in 5mC have no such 
restriction (C-C bond can freely rotate) and would adopt a favourable 
conformation for hydrogen abstraction. As for the 5hmC in the pre- 
catalysis structure, the hydroxyl group forms a hydrogen bond with the 
C-1 carboxyl group of 2-OG, which positions one of its abstractable 
hydrogens close (3.74 A) to the iron (Fig. 4d). However, this hydro- 
gen bond would not be maintained after the decarboxylation of 2-OG, 
in which the C-1 carboxyl group is converted into a CO, molecule 
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(Fig. 4a). Further calculations indicate that free 54mC and 5fC have 
the tendency to form intracellular hydrogen bonds, which may prevent 
the hydrogen abstraction and lead to the reduced activity (Fig. 4e, f). 
Crystal structures of the catalytic intermediate states of TET2 would 
provide additional structural evidence for understanding the mecha- 
nism for TET-mediated oxidation. Note that hydrogen abstraction is 
not always the rate-limiting step for 2-OG/Fe(II)-dependent dioxy- 
genases. Interestingly, iterative oxidation on different substrates was 
observed in several 2-OG/Fe(II)-dependent dioxygenases”””>-*”. Thus, 
it is of interest to test whether a similar mechanism applies for iterative 
oxidation mediated by other 2-OG/Fe(II)-dependent dioxygenases. 

In summary, this work reveals that TET proteins are more active 
on 5mC-DNA than 5hmC/5fC-DNA substrates, which results from 
the distinct intrinsic properties of 5mC/5hmC/5fC within the cata- 
lytic cavity of TET proteins during oxidation. Thus, once established 
in the genome, 5hmC is less prone to further oxidization, unless TET 
proteins are stimulated to be more active. Regulation of the genomic 
localization and/or enzymatic activity of TET proteins may precisely 
control the patterns of 5mC and its oxidized derivatives. For example, 
vitamin C enhances TET activity and significantly increases levels of 
5hmC/5fC/5caC in mouse embryonic stem cells and regulates somatic 
cell reprogramming”. TET proteins are therefore evolutionarily tuned 
to be less reactive towards 5hmC, perhaps to facilitate its generation as a 
potentially stable mark for regulatory functions. Genome-wide analyses 
indicate that 5fC and 5caC are mainly observed in specific genomic 
regions”*!°, suggesting that TET might be more concentrated or active 
in these regions by its interacting proteins or some mechanisms yet to 
be discovered. It will also be of interest to investigate the mechanism 
by which TET proteins are either activated to generate more 5fC/5caC 
for DNA demethylation or to retain relatively low activity to generate 
5hmC in vivo. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Protein expression and purification. The procedure for protein expression and 
purification of TET2 has been described previously”. In brief, the open reading 
frame corresponding to the catalytic domain of human TET1 (1418-2136) or 
TET2 (1129-1936 with residues 1481-1843 replaced by a 15-residue GS-linker 
GGGGSGGGGSGGGGS) was sub-cloned into modified pGEX-6p-1 or pET-28b 
vector and the plasmids were transformed into Escherichia coli strain BL21(DE3). 
The transformants were grown at 37°C to an absorbance at 600 nm of 0.8 and 
induced by adding 0.1 mM isopropyl-6-p-thiogalactopyranoside. After further 
culture at 16°C for 14-18 h, the cells expressing TET1 or TET2 were lysed and 
the supernatant was subjected to Ni-NTA columns for affinity purification. His/ 
GST-tag was removed by on-column digestion at 4°C for 12-16h. The eluted 
proteins were further purified by ion exchange and gel filtration chromatography. 
The purified proteins were subjected to SDS—polyacrylamide gel electrophoresis, 
stained by Coomassie blue and visualized on a Tanon-5200 Chemiluminescent 
Imaging System (Tanon Science & Technology). The proteins were concentrated 
to 25mg ml! and used for in vitro assays and crystallization. 

TET2 enzymatic assays and LC-MS/MS analysis. Procedures for in vitro enzy- 
matic assays and LC-MS/MS were described previously’. In brief, various DNA 
substrates was incubated with TET1 or TET2 in buffer containing 50 mM HEPES 
pH 8.0, 100 mM NaCl, 100 uM Fe(NH,)2(SO4)2, 2mM ascorbate, 1 mM DTT and 
1mM ATP at 37°C. Reactions were stopped by the addition of ten volumes of 
Buffer PN (Qiagen), and the DNA products were then purified using a QIAquick 
Nucleotide Removal Kit (Qiagen) following the manufacturer’s instructions. The 
purified DNA products were denatured at 100°C for 10 min and further digested 
to nucleosides with 0.5 U nuclease P1 (Sigma Aldrich) at 37°C for 16h and 0.5 U 
CIP (NEB) at 37°C for 1.5h. For the product inhibition assay, 1 uM TET2 was 
incubated with biotinylated 26-bp 5mC/5hmC/5fC-DNA substrates in the pres- 
ence of corresponding biotin-free 5amC/5fC/5caC-DNA with the same sequence. 
Unmodified CpG-containing DNA was used as control. For the deuterium iso- 
tope effect assay, 0.25 uM or 0.5 uM TET2 was incubated with biotinylated 20-bp 
(?H]35mC-/['H];5mC-DNA substrates. After reaction for 10 min at 37°C, followed 
by heat at 65°C to deactivate TET2, the biotinylated 20-bp [H],5mC/[?H]3;5mC- 
DNA was purified by strepavidin beads, and then treated as described above. The 
samples were subjected to LC-MS/MS using Shimadzu LC (LC-20AB pump) sys- 
tem. The amounts of 5mC derivatives were calculated according to the external 
standard curves for ['H]3;5mC, [?H];5mC, [!H];5hmC, [?H],5hmC, 5fC, 5caC and 
guanine (Extended Data Fig. 1). 

Preparation of DNA substrates. All DNA duplexes (summarized in the table 
below) were synthesized by Generay Biotech and annealed from single-stranded 
primers. A palindromic 12-bp dsDNA (5’-ACCAC(C!™)GGTGGT-3’, 
chm = 5-hydroxymethyldeoxycytosine) and a hemi-formyl dsDNA (top 
strand, 5’-ACTGT(C'G)AAGCT-3’; bottom strand, 5’-AGCTTCGACAGT-3’; 
C'= 5-formyldeoxycytosine) were used for crystallization. Palindromic 12-bp 
dsDNAs (D1: top strand, 5’-ACCACXGGTGGT-3’; X= 5mC, 5hmC or 5fC) 
were used for stopped-flow spectrometry analyses. FAM-labelled palindromic 
18-bp dsDNAs (D2: top strand, 5’-FAM-CAGCACACXGGTGTGCTG-3’; X= C, 
5mC, 5hm¢, 5fC or 5caC) were used for fluorescence polarization measurements. 
Biotinylated 26-bp dsDNAs (D3: top strand, 5’-biotin- CAGTAGTCTGGAC 
ACACXGGTCATGA-3’; bottom strand, 5’-TCATGACXGGTGTGTCCAGACTA 
CTG-3/; X= 5mC, 5hmC or 5fC) were used for SPR analyses. Palindromic 58-bp 
dsDNAs (D5: top strand, 5‘-ACGATCAGATCCTAAGGCATCAGCACACXGGT 
GTGCTGATGCCTTAGGATCTGATCGT-3’; X = 5mC, 5hmC or 5fC) were used 
for enzymatic assays. To test the effect of flanking DNA sequence besides CpG 
dinucleotide on the substrate preference of TET2, we synthesized AT-rich dsDNAs 
(D6: top strand, 5’-ACCAGCAGATGGCCAGGCATCAGATATAXGTATATCTG 
ATGCCTGGCCATCTGCTGGT-3’; X= 5mC, 5hmC or 5fC) and CG-rich 58-bp 
dsDNAs (D7: top strand, 5’-ACTCAACAGACTACACAGTAGTGCCCCCXGCC 
CAGATGCTATTCAGTAACTGACACTG-3’; bottom strand, 5’-CAGTGTCAGT 
TACTGAATAGCATCTGGGXGGGGGGCACTACTGTGTAGTCTGTTGAGT- 
3’; X=5mC, 5hmC or 5fC), compared with the 58-bp dsDNAs (D5). To test the 
effect of DNA length on the substrate preference of TET2, we synthesized 100-bp 
dsDNAs (D8: top strand, 5’-GCTTGGAGGTCCAAGCTAGCTACGATCAGATC 
CTAAGGCATCAGCACACXGGTGTGCTGATGCCTTAGGATCTGATCGTAG 
CTAGCTTGGACCTCCAAGC-3’; X= 5mC, 5hmC or 5fC), compared with 26-bp 
dsDNAs (D4: top strand, 5’-CAGTAGTCTGGACACACXGGTCATGA-3’; bottom 
strand, 5’-TCATGACKGGTGTGTCCAGACTACTG-3’; X = 5C, 5mC, 5hmC, 5fC 
or 5caC) and 58-bp dsDNAs (D5). To test product inhibition on TET2 activity, 
we used biotinylated 26-bp dsDNAs (D3) and corresponding biotin-free 26-bp 
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dsDNAs (D4) with the same sequences. Biotinylated 20-bp dsDNA with deuteri- 
um-replaced 5mC (D9: top strand, 5’-biotin- CTTGGACACACXGGTCATGA-3’; 
bottom strand, 5’/-TCATGACCmGGTGTGTCCAAG-3’; X= [?H]35mC) or regu- 
lar 5mC (D10: top strand, 5’-biotin- CTTGGACACACXGGTCATGA-3’; bottom 
strand, 5’-TCATGACCmGGTGTGTCCAAG-3’; X = 5mC) were used to test the 
isotope effect on TET2 activity. The DNAs used in this study are summarized in 
Supplementary Table 1. 

Protein crystallization. Procedures for protein purification of TET2 have been 
described previously’. In brief, haman TET1 (1418-2136) or TET2 (1129-1936 
with residues 1481-1843 replaced by a 15-residue GS-linker) was purified to 
homogeneity for crystallization and enzymatic activity assays. For simplicity, the 
two proteins were designated as TET1 and TET2 in this work. Sequences of DNA 
used for crystallization and assays are described in the Preparation of DNA sub- 
strates section above. The crystals of human TET2 in complex with 12-bp 5hmC 
containing DNA were obtained using the hanging-drop, vapour-diffusion method 
by mixing 1 pl protein-DNA complex solution (25 mg ml~) with 1 ul reservoir 
solution containing 0.1 M MES (pH 6.4), 26% PEG monomethy] ether 2000 at 
277 K. For crystallization of TET2 and hemi-formylated dsDNA complex, 1.5 ul 
protein-DNA complex solution (20 mg ml~!) and 1.5 ul reservoir solution con- 
taining 0.1 M MES (pH 6.3), 21% PEG monomethy] ether 2000 were mixed and 
equilibrated by hanging-drop, vapour-diffusion at 277 K. 

Data collection and structure determination. The data of TET2-5hmC-DNA 
and TET2-5fC-DNA were collected at wavelengths of 0.9792 A and 0.97876 A at 
Shanghai Synchrotron Radiation Facility beamlines BL17U and BL19U, respec- 
tively. Data were indexed, integrated and scaled using program HKL2000 (ref. 29). 
The structure was solved by molecular replacement using the TET2-5mC-DNA 
complex structure (PDB 4NM6) as the searching model’. The initial models were 
manually built with COOT™ and refined using PHENIX package". The quality of 
the final model was validated with the program MolProbity”, indicating that 98.3% 
of residues were in favoured regions, 1.5% in allowed regions and 0.2% in outlier 
regions for TET2-5hmC-DNA, and 97.1% residues were in favoured regions, 2.7% 
in allowed regions and 0.2% in outlier regions for TET2-5fC-DNA. All structure 
figures were generated using PyMOL*. 

Fluorescence polarization measurements. Various modifications of FAM-18-bp 
DNA (5nM) were mixed with increasing amounts of TET2. The mixtures were 
incubated in buffer containing 10mM HEPES pH 7.0, 100mM NaCl for 30 min 
at 4°C. To measure DNA-binding affinity for different substrates under catalytic 
conditions, buffer A containing 100 1M Fe?*+ and 1mM NOG, and buffer B con- 
taining 100 uM Mn”* and 1 mM 2-OG, were used to mimic catalytic conditions. 
In addition, buffer C containing 100 uM Fe**+ and 1 mM succinate was used to 
mimic the product release condition. Fluorescence polarization measurements 
were performed at 25°C on a Synergy 4 Microplate Reader (BioTek). The data from 
three independent experiments were fitted using GraphPad Prism 5. 

SPR measurements. Biotinylated-DNA was coupled to SA-chip (GE Healthcare) 
with a response of 20-30 response units, which was achieved by adjusting the 
concentration of the oligonucleotides and the time of contact. All SPR measure- 
ments were performed using a BlAcore T-100 instrument in running buffer con- 
taining 10mM HEPES pH 7.4, 150mM NaCl, 0.005% surface P20 at a flow rate of 
30 pl min“! and a temperature of 25°C. Increasing concentrations of TET2 (0.5 uM, 
0.75 uM, 1 uM, 1.5 4M, 2 uM) were injected into the same surface in running buffer 
for 60s. The surface was washed with running buffer for 200s after the dissociation 
of the complexes. The data were analysed by fitting all curves using a two-state 
binding model to determine the kinetics association and dissociation rate constants 
in Biacore T100 evaluation software. 

Stopped-flow spectrometry. Stopped-flow kinetic experiments were performed 
at 25°C with an SF-61 DX2 double-mixing instrument and a Xe lamp (SF-61DX2, 
TgK Scientific). For all experiments in this study, reactions were monitored in 
PM mode at 318nm (to monitor the ferryl-oxo intermediate). Reaction mixture 
A containing 0.5mM TET2 and 1 mM Fe(NHy4)2(SO4)2 in O2-free buffer solu- 
tion (10mM HEPES 7.0, 100mM NaCl, 10mM £-ME) was prepared under high 
pure N2-atmosphere in an MBraun glove box. Reaction mixture B containing 
2mM 2-OG, 1mM ATP and 0.5 mM 12-bp 5mC/5hmC/5fC dsDNA in the buffer 
solution (10 mM HEPES 7.0, 100mM NaCl, 10 mM B-ME) was prepared in air 
environment. The two reactions (A and B) were mixed rapidly and the ferryl-oxo 
intermediate was monitored at 318 nm. Absorbance changes as a function of time 
were recorded and all curves were plotted with Origin software. 

Computational details. The homolytic C-H BDEs for the 5-substitution groups 
of 5mC/5hmC/5fC were calculated using Gaussian 09 (ref. 34) as the enthalpy 
change of the following reaction at 298.15 K and 101.3 kPa: R-H — R* + He, 
where R-H, Re and He represent the parent base, the corresponding radical and 
the hydrogen atom, respectively. The initial geometry of each species was optimized 
at the UB3LYP/6-311+G(d,p) level. Frequency calculations at the same level were 
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conducted to verify that the optimized structures were the real minima without any 
imaginary vibration frequency. The single point energy and thermal corrections for 
the optimized structures were calculated by using several high-precision composite 
methods implemented in Gaussian 09, including CBS-QB3 (ref. 35), G4 (ref. 36), 
G3(MP2B3) (ref. 37) and CBS-QB3, with a conductor-like polarizable continuum 
model (C-PCM) implicit water model*®. 

To identify the low energy conformations of 5hmC, a relaxed potential energy 
surface scan on the dihedral angle of the bond between the C5 carbon and the 
carbon atom of the hydroxymethyl group was performed at the B3LYP/6- 
31G(d,p) level. The conformations with the lowest energies on the potential 
energy surface were then fully optimized at the B3LYP/6-311+ G(d,p) level. The 
free energies for the low energy conformations and the conformations observed 
in the TET2-5hmC crystal structure were calculated by using the CBS-QB3 
method. 

Molecular dynamics simulations of the free 54mC and 5fC nucleotides were 
performed using the Amber 11 package. The semi-empirical AM1 method was 
used to describe the nucleotide. The generalized-Born implicit solvent model*® 
was used to mimic the environment within the binding pocket. Molecular 
dynamics simulation (10 ns) was performed for each model and 10,000 snapshots 
from the molecular dynamics trajectory were used to estimate the occupancy of 
the intramolecular hydrogen bond. The criteria for hydrogen bond formation 
were defined as (1) the O-H distance less than 2.5 A and (2) the N-H-O angle 
larger than 90°. 
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Extended Data Figure 1 | Steady-state kinetic analyses for TET-mediated 
oxidation on 5mC/5hmC/5fC-DNA substrates. a, Protein purification 
of human TET1 catalytic domain. Representative gel-filtration profile 
of human TET1 (residues 1418-2136) is shown. The peak position is 

about 13 ml, which corresponds to the monomer of TET1 with molecular 


mass of about 79 kilodaltons. The peak fractions were subjected to SDS- 


polyacrylamide gel electrophoresis and stained by Coomassie blue. The 


column used for gel filtration was Superdex 200 (GE Healthcare, 10/300 GL). 


b, Standard curves for 5mC, 5hmc, 5fC, 5caC, [7H]35mC, [7H];5hmC and 
guanine for quantification in LC-MS/MS. Good linearity was obtained for 


the range of guanine and various cytosine derivatives as indicated. The 


level of guanine (equal to total cytosine and its derivatives) was detected. 
Note that three standard curves were generated for 5mC/5hmC/5fC 
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in low/mid/high concentrations, respectively. Two standard curves were 
generated for 5caC and [?H]35mC in mid/low-concentration. c-e, Reaction 
progress curves of substrate and fraction products versus incubation time 
(2.5 min). Initial rates (nanomoles per second) for product generation 
were measured using various concentrations of dsDNA substrate and 
TET2. The reactions were conducted using 58-bp dsDNA (D5, one central 
5mC/5hmC/5fCpG site) as substrate. Quantification was calculated from 
two independent assays; error bars, s.d. for duplicate experiments. f-h, To 
avoid generation of multiple products, we optimized protein concentration 
so that only one product was detected for all the reactions under our 
experimental conditions. The product generations are shown for the 
reactions with the highest substrate concentration for the longest time. 
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Extended Data Figure 2 | Similar substrate preference of TET2 towards 
DNA with different lengths and sequences. a—e, Enzymatic activity of 
TET2 was measured on 5mC/5hmC/5fC-DNA substrates (one central 
5mC/5hmC/5fCpG site) of different lengths (a—c) or sequences (b, d, e). 
dsDNA substrates of 26 bp, 58 bp and 100 bp were used for reactions to test 
the effect of DNA length on substrate preference of TET2 (a-c). AT- and 
CG-rich dsDNA substrates of 58 bp were used for reactions to compare 
the effect of DNA sequence on substrate preference of TET2 (b, d, e). 
TET2 (11M) was used for all reactions. Quantification was calculated 
from three independent assays; error bars, s.d. for triplicate experiments. 
TET2 showed similar substrate preference (higher activity on 5nC-DNA 
than on 5hmC/5fC-DNA) for all DNA substrates tested. In addition, TET2 


+ 26bp 5CpG DNA 1.4uM 
Ba 26bp 5caCpG DNA 0.7uM 
+ 26bp 5caCpG DNA 1.4uM 


has no sequence preference apart from CpG dinucleotide. Notably, TET2 
showed higher activity on shorter DNA substrate under our experimental 
conditions. It is not surprising because the enzyme should be able to find 
one CpG site easier on shorter DNA substrate (compared with longer 
DNA). f-h, Effects of substrate/product on enzymatic activities of TET2. 

f, Enzymatic activities of 1 uM TET2 for 26-bp 5mC-DNA in the presence 
of 26-bp 5C/ShmC-DNA. g, Enzymatic activities of 1 uM TET2 for 26-bp 
5hmC-DNA in the presence of 26-bp 5C/5fC-DNA. h, Enzymatic activities 
of 1 uM TET2 for 26-bp 5fC-DNA in the presence of 26-bp 5C/5caC-DNA. 
The presence of CpG-DNA or different substrate/products has negligible 
impact on TET2 activity. 
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Extended Data Figure 3 | Fluorescence polarization measurements of 
DNA-binding affinities of TET2 in different conditions. a—c, Fluorescence 
polarization measurements of substrate DNA-binding affinities of TET2 
(10mM HEPES pH 7.0, 100mM NaCl) in the presence of 100 1M Mn** 


and 1 mM 2-OG (a), 100 1M Fe?* and 1mM NOG (b) and 100 uM Fe** and 
1mM succinate (c). No significant difference was observed for the DNA- 
binding affinity of TET2 for different substrate/product under conditions 
mimicking oxidation or after oxidation. 
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Extended Data Figure 4 | Structural comparison of TET2-5mC-DNA, 
TET2-5hmC-DNA and TET2-5fC-DNA complexes. a-c, Structural 
comparison of the three complexes. The individual structures of the three 
complexes are shown on the left panel and the superimposed structures are 
shown on the right. The structures are shown in ribbon representations. 
TET2-5hmC-DNA and TET2-5fC-DNA are coloured as in Fig. 3b, and 
TET2-5mC-DNA is coloured in grey. The colour scheme is indicated. Stick 
representations show 5mC and 5hmC in two structures. d, Close-up view for 
the comparison of 5mC, 5hmC and 5fC in the three structures. Note that the 
cytosine portions of the three bases adopt almost identical conformations 
within the catalytic cavity. e-g, Close-up views of the catalytic DSBH 

core of the three structures, shown in ribbon representation with critical 
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= 
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8 SN 
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$1898 
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residues indicated in stick representations. The nitrogen and oxygen atoms 
are coloured in blue and red, respectively. Hydrogen bonds and Fe(II) 
coordination are indicated as dashed lines. Fe(II) and crystallographic water 
molecules are shown as red and green balls, respectively. The cytosine of 
5hmC is specifically recognized by TET2 in a similar manner to that of 5mC 
in the TET2-5mC-DNA structure. An additional hydrogen bond between 
the hydroxymethyl group of 54mC and NOG was observed in the TET2- 
5hmC-DNA structure. The additional hydrogen bond may not be strong 
enough to affect the binding affinity between TET2 and DNA because the 
interaction is mediated by extensive hydrogen bonds and hydrophobic 
interactions (Extended Data Fig. 5). 
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Extended Data Figure 5 | Recognition of CpG dinucleotide by TET2. 

a, Two different views of the interaction between the G7:hmC7’ base pair 
and TET2 for the specific recognition of CpG dinucleotide in TET2-5hmC- 
DNA. Water-mediated hydrogen bonds are formed between base hmC7’ 

of DNA and residues Y1295 and R1302 of TET2. The 2F observed — Fealculated 
simulated annealing omit maps for residues involved in CpG recognition 
outside catalytic cavity are shown. The maps were calculated at 1.80 A and 
contoured at 1.00. b, Two different views of the interaction between the 
G7:C7' base pair and TET2 for the specific recognition of CpG dinucleotide 
in TET2-5fC-DNA. The 2Fobserved — Featculated Simulated annealing omit 
maps were calculated at 1.97 A and contoured at 1.0c. c, d, Close-up views 
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of the interactions between TET2 and 5hmC-DNA (c) and TET2 and 5fC- 
DNA (d). Critical bases of DNA and residues of TET2 for the interactions 
are shown in stick representations. Hydrogen bonds are indicated as dashed 
lines. Water molecule is shown as a green ball. The nitrogen, oxygen and 
phosphorus atoms are coloured in blue, red and orange, respectively. 

The 2F observed — Fealculated Simulated annealing omit maps for residues 
involved in DNA interaction are shown with these residues omitted from 
the calculation. Most residues are well covered by the electron density, 
indicating that these residues were built correctly in the structural model. 
e, f, Representation of intermolecular contacts between TET2 and 
5hmC-DNA (e) and 5fC-DNA (f). 
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Extended Data Figure 6 | Kinetic traces from the reactions of stopped-flow absorption at 318 nm. No decay of catalytic intermediate was 
TET2-Fe(II)-2-OG. Traces from the reactions in the absence (a) and observed for any of the measurements. These analyses serve as a negative 
presence (b) of unmethylated CpG-DNA or random DNA (c). The control for the assays shown in Fig. 4c. 


formation of catalytic intermediate for each reaction was monitored by 
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Extended Data Table 1 | DNA-binding affinities of TET2 


a. 
Kd(yM) 
FAM-18:bp dsDNA as LA eee 
5C 0.54+0.06 0.81+0.09 0.80+0.08 
5mC 0.99+0.15 1.3440.18 0.76+0.06 
5hmc 0.41+0.07 0.97+0.13 0.60+0.04 0.48+0.02 
5fC 0.56+0.10 0.62+0.09 0.53+0.05 0.54+0.02 
5caC 0.25+0.01 
b. 
5mC 5hmc 5fC 
Kai(1/Ms) 8.2E+4 + 2.8E+3 9.9E+4 + 2.1E+3 1.6E+5 + 4.5E+3 
Kai(1/s) 8.0E-2 + 2.9E-3 1.8E-1 + 2.4E-3 2.8E-1 + 5.0E-3 
Ka2(1/s) 6.4E-3 + 2.4E-4 2.8E-3 + 5.4E-5 3.5E-3 + 6.7E-5 
Kao(1/s) 1.5E-2 + 3.6E-4 4.2E-3 + 1.5E-4 3.7E-3 + 1.3E-4 
KD(M) 6.8E-7 1.1E-6 9.0E-7 
Rmax(RU) 54.95 58.44 40.87 
Chi?(RU?) 0.493 0.258 0.198 


a, Fluorescence polarization measurements of DNA-binding affinities of TET2. FAM-labelled 18-bp DNA (5 nM) was incubated with increasing amounts of TET2 in a buffer containing 10 mM HEPES pH 7.0 and 
100mM NaCl. Mn?+ (1004M) and 2-OG (1 mM), or 100 4M Fe?* and 1mM NOG, were added to mimic the catalytic condition. Fe®* (100 yM) and succinate (1 mM) were added to mimic the product release 
process. The binding affinities were measured using fluorescence polarization. Error bars, s.d. for triplicate experiments from three independent experiments. b, SPR measurements of DNA-binding affinities 
of TET2. The sensorgrams were analysed using BIA evaluation software (Biacore). The response curves of various protein concentrations were fitted according to the two-state binding model described by the 
following equation. 


Kal Ka2 
A+B = AB = AxB 
Kq1 Ka2 


TET2 binds to 5mC, 5hmC or 5fC through a base flapping mechanism®“". The process by which TET2 recognizes modified DNA is considered the first step, with the binding affinity calculated by the equation 
Kai =Kai/Kai. The second step is base flipping, and the binding affinity is calculated by the equation Kaz=Kaz/Kaz. The overall equilibrium binding constant is calculated by the equation Ka=Kai(1 + Kaz) and 
Kp= 1/Ka. The values of x? for all three measurements are less than 1% Rmax- 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Table 2 | Data collection and refinement statistics 


Data collection 
Space group 
Cell dimensions 

a, b, c (A) 

a, B,y (°) 
Resolution (A) 
Reym OF Rmerge 
Hol 
Completeness (%) 


Redundancy 


Refinement 
Resolution (A) 
No. reflections 
Ryorki Riree 
No. atoms 
Protein 
DNA 
OGA 
Ligand/ion 
Water 
B-factors 
Protein 
DNA 
OGA 
Ligand/ion 
Water 
R.m.s deviations 
Bond lengths (A) 


Bond angles (°) 


*Highest resolution shell is shown in parenthesis. 


TET2-5hmC-DNA 


C222, 


48.3, 87.5, 260.9 
90.0, 90.0, 90.0 

50 - 1.80 (1.86 - 1.80) * 
0.094 (0.828) 

19.2 (1.9) 

98.0 (88.6) 

9.6 (4.8) 


1.80 
50686 


0.177/0.212 


33.0 


46.6 


0.006 


TET2-5fC-DNA 


C222, 


48.2, 88.0, 268.0 
90.0, 90.0, 90.0 

50 - 1.97 (2.04 - 1.97) * 
0.055 (0.624) 

28.0 (2.2) 

98.9 (93.2) 

6.0 (4.7) 


1.97 
57919 


0.205/0.248 


262 
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Extended Data Table 3 | Calculated C-H BDE for the 5-substitution groups of 5mC, 5hmC and 5fC 


C-H BDE (kcal/mol) at 298.15K 


Base 
CBS-QB3 G4 G3(MP2B3) CBS-QB3 (CPCM) 
NH, 
Ho 
na iy, 
ae. | 89.74 88.46 91.01 90.39 
Oo N 
H 
NH, . 
CH 
n~ SH 
ae | 87.51 86.16 88.76 86.20 
fe) N 
H 
NH, i 
Cc 
ie 
L | H 91.98 89.02 91.89 92.89 
fe) N 
H 


A bond cleavage reaction (R-H-R+ + Hs) was used to calculate each C-H BDE as the difference in enthalpies (AH) for the parent base (R-H) and the corresponding radical (R*) plus hydrogen atom (Hs). 
Energies were calculated by using several high-precision computational methods implemented in Gaussian 09, including CBS-QB3, G4, G3(MP2B3) and CBS-QB3, with a C-PCM implicit water model. The 


experimentally estimated gas phase BDE for the corresponding C-H bonds in Ph-CH3, Ph-CH20H and Ph-COH were reported to be 89.7 + 1.2kcal mol-!, 79.0+ 2.0kcal mol~! and 88.7 + 2.6 kcal mol-!, 
respectively*2. 
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ILLUSTRATION BY THE PROJECT TWINS. 


TOOLBOX 


EIGHT WAYS TO CLEAN 
A DIGITAL LIBRARY 


Scientists have a surfeit of options to choose from in the competitive 
market of reference-management software. 


BY JEFFREY M. PERKEL 


dam Rocker didn’t expect the software 
At managed his digital reference 
library to flag up better ways he could 
be doing his research. But his electronic filing 
system of choice, ReadCube, periodically scans 
his library and suggests related papers, rather as 
some music-file-management programs high- 
light recommended tunes. And that feature, he 
says, has brought up some unexpected gems. 
As a graduate student, Rocker, who is 
now studying medicine at the University of 
Ottawa, was researching bacterial infections 
in zebrafish. ReadCube highlighted a paper 
that described a way to entrap the fish using 


microfluidics — a field whose literature he 


would not normally read — that was much eas- 
ier than his own method. Being alerted to the 
research was “really rewarding”, Rocker says, 
although he was ultimately too invested in his 
own project to adopt the alternative approach. 

As Rocker discovered, today’s reference- 
management tools go above and beyond sim- 
ple electronic filing. Rather like a Swiss-army 
knife, each tool now appeals to customers by 
offering an ever-evolving set of extra features. 

This article focuses on eight tools — colwiz, 
EndNote, F1000Workspace, Mendeley, Papers, 
ReadCube, RefME and Zotero — all compet- 
ing in the reference-management market (see 
‘Reference-management software’). Some 


excel at streamlining the process of browsing 
and building literature libraries, whereas 
others focus on creating bibliographies, aid- 
ing collaboration through the use of shared 
workspaces or recommending papers. (One, 
ReadCube, is owned by Digital Science, a firm 
operated by the Holtzbrinck Publishing Group, 
which also has a share in Nature's publisher.) 
Each tool exists to help researchers to tame 
the digital flotsam and jetsam of scattered, 
downloaded PDFs. Most scientists can relate 
to that problem: as they grab PDFs from jour- 
nal websites — where they are often assigned 
impenetrable alphanumeric codes as filenames 
— and dump them into any convenient folder, 
chaos can quickly take hold, with multiple > 
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» copies of files spread across hard disks. 

“Tn science, or at least in my experience, we 
tend to end up with a folder in the desktop with 
3,000 really weirdly named PDF files, which 
we can never find when we need them,’ says 
Ratil Delgado- Morales, a neuroscientist at 
the Bellvitge Biomedical Research Institute in 
Barcelona, Spain. 

Reference-management tools address that 
confusion by indexing a hard disk. Typically, the 
process of dragging and dropping a PDF into 
an application window triggers the software to 
try to identify it using the DOI or title, and to 
retrieve relevant metadata (such as title, key- 
word and author names) from online servers. 

Researchers can also assign software to 
monitor specific folders into which they drop 
their files. They can then find PDFs through 
a simple search for author name, keyword 
or, in some cases, their own notes. Delgado- 
Morales solved his problem, for example, by 
organizing his literature library with Papers, 
a user-friendly application that automatically 
renames files according to any scheme he 
chooses. Other tools offer similar functions, 
except for RefME — a website and mobile app 
— which stores only lists of references and not 
the PDFs themselves. 


CORE FUNCTIONS 

Most of the tools help researchers to import 
literature from a variety of online sources. 
Many offer in-app searching of external data- 
bases such as PubMed and Google Scholar, as 
well as web-browser plugins that grab refer- 
ence data (and some- 


times, associated “We tend to end 
PDFs) from journal up withafolder 
websites and other will 3,000 
pages. really weirdly 
Zotero — a free, named PDF 
open-source soft- files. id 


ware project — was 

founded ten years ago specifically to tackle the 
problem of extracting information from a web 
browser, says project director Sean Takats of 
George Mason University in Fairfax, Virginia. 
“That's the key feature of Zotero, and remains 
one ofits strongest compared to other reference 
managers,’ he says. RefME offers the unusual 
option of adding references by scanning a bar- 
code with a smartphone camera. 

One of the best-known features of reference- 
management software is the ability to insert 
in-text references in a research paper and to 
create bibliographies in any format. EndNote, 
a widely used commercial package, has offered 
this feature for decades, but now faces compe- 
tition from many modern tools. 

Many tools interface with common word- 
processing software (usually Microsoft Word, 
but sometimes OpenOffice and related free- 
ware suites as well) so that a user typing up 
a research article need only select the papers 
that they want to mention and click a button to 
have codes inserted into the document to mark 


REFERENCE-MANAGEMENT SOFTWARE 
Eight of the most popular tools. 


Product URL Platform Free? 
colwiz colwiz.com Desktop/web/mobile Yes 
EndNote endnote.com Desktop/web/mobile Yes, with some limited features 
F1000Workspace £1000.com/work/ Web No 
Mendeley mendeley.com Desktop/web/mobile Yes, with some limited features 
Papers papersapp.com Desktop/web/mobile No 
ReadCube readcube.com Desktop/web Yes, with some limited features 
RefME refme.com Web/mobile (only Yes 

stores references) 
Zotero zotero.org Desktop/web/mobile Yes, with some limited features 
See the online version of this article at go.nature.com/xbp9ot for a fuller comparison. 


the in-text reference. Later, the user can create 
a bibliography and in-text citations according 
to several thousand journal styles, picking his 
or her choice from a pull-down list. 

Most tools include built-in PDF readers for 
reading and annotating articles — typically 
allowing users to search through comments 
and notes — as well as cloud-based capabili- 
ties for syncing those comments (and the PDFs 
themselves) between, for example, an iPad 
and a desktop computer. But ReadCube and 
colwiz try to offer richer PDF reading experi- 
ences. In ReadCube, for instance, in-line cita- 
tions and author names in PDFs are rendered 
as active hyperlinks to provide direct access to 
cited articles and publication lists. The same 
functionality is available when viewing and 
annotating PDFs on the websites of partnering 
publishers (including, for ReadCube, Nature 
and Wiley; and, for colwiz, Taylor & Francis). 

Many of these tools can identify articles 
related to specific items in a library, or recom- 
mend articles on the basis of the library’s con- 
tent overall. F1000Workspace — like ReadCube 
— uses an algorithm to do this. It also taps into 
recommendations made by a community of 
10,000 or so specialists. However, many other 
stand-alone software products also recommend 
papers (see Nature 513, 129-130; 2014). 


SET TO SHARE 

Many tools now allow researchers to set up 
group libraries or share key papers with distant 
collaborators, although this process is care- 
fully managed to prevent violation of publish- 
ers’ copyright. Those in public groups using 
Mendeley, for instance, can share only infor- 
mation about a paper — the equivalent of a 
library-catalogue entry. Only users in pri- 
vate groups can share and modify PDFs (and 
groups must upgrade to a paid account to add 
more than three individuals). 

Brenton Wiernik, an organizational- 
psychology PhD candidate at the University of 
Minnesota in Minneapolis, uses a shared library 
in Zotero for collaborative projects involving 
systematic reviews and meta-analyses of the 
literature in his field. Such efforts might involve 
15-20 people, he says: some downloading 
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articles into a shared library; others reading 
them; still more adding annotations and tags 
and logging key data. 

According to Wiernik, the process is akin 
to using a shared Dropbox folder, with the 
added benefit that Zotero tracks and maintains 
metadata, notes and annotations. For instance, 
researchers can use a dedicated tag to indicate 
that they are processing an article, thereby sig- 
nalling to collaborators that they should work 
ona different article to avoid duplicated effort. 

F1000Workspace and colwiz both extend 
sharing to include features for preparing 
manuscripts and managing projects. With 
F1000Workspace, researchers can use a plugin 
to upload Microsoft Word manuscripts to a 
secure location, thereby enabling team mem- 
bers to comment on the shared copy — although 
the text cannot be edited in the browser, says 
Joao Peres, the company’s product-development 
manager. Peres plans to implementa ‘one-click’ 
article-submission feature that sends papers 
directly from F1000Workspace to journal edi- 
tors, starting with the journal F1000Research. 
And colwiz also permits users to share docu- 
ments to an online drive for team members to 
view and comment on. 

Given the highly overlapping feature sets of 
these tools, a user’s choice often comes down 
to particular individual priorities. Richard 
Karnesky, a materials scientist at the Sandia 
National Laboratories in Livermore, Califor- 
nia, supports Zotero for its open-source ethos, 
for example. 

Perhaps the best reason for using a reference 
manager is the technology's ability to provide a 
form of searchable memory. Imagine, says Boyd 
Steere, a senior research scientist at pharma- 
ceutical firm Eli Lilly in Indianapolis, Indiana, 
a desk piled high with printed papers: Post-it 
notes hanging out, writing in the margins, doo- 
dles, notations, arrows and more. Today’s PDF- 
filled, digital folders are in many ways no easier 
to navigate. With a digital reference manager, 
however, buried knowledge is just a keyword 
search away. m 


Jeffrey M. Perkel is a writer based in 
Pocatello, Idaho. 
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Clear direction 


Managing laboratory members as well as a research strategy can be difficult for 
early-career principal investigators, but help is at hand. 


BY BOER DENG 


ivek Kumar admits that he has 

\ / not always been the best manager. 
Routinely, the neuroscientist would fail 

to provide important details about his expecta- 
tions to junior colleagues, then lose his temper 
when they did not meet those expectations. In 
the laboratory where he conducted his post- 
doctoral research, for example, Kumar tasked 
the technician with cloning cells but did not 
give her a deadline. She had not completed the 
work when he demanded the clones, and she 
later told him that her blood pressure would rise 


whenever she heard him approaching. 

The comment might have been difficult 
to hear, but it helped Kumar to realize that he 
needed to improve his management skills. 
When he set up his own lab in January 2015 at 
the Jackson Laboratory in Bar Harbor, Maine, 
he was determined to receive training in how 
to be a good leader, mentor and manager. A few 
months later, Kumar attended a workshop on 
leadership at the Cold Spring Harbor Labora- 
tory in New York. There, he learned about the 
communication and negotiation skills that 
would help him in his role as principal inves- 
tigator (PI). But almost one year on, that role 


can still feel uncomfortable. Managing people 
remains one of his biggest challenges, Kumar 
acknowledges — especially when it comes to 
having difficult conversations with colleagues 
about expectations. However, the course did 
teach him new skills and tactics. “I came away 
from the workshop with a clear sense that it’s 
part of my responsibility to make the whole lab 
a success.” 

Many junior researchers say that they feel 
poorly prepared for managerial roles. “Know- 
ing how to do good science, that’s the price of 
admission for being a researcher,’ says Jeff Gus- 
tafson, an organic chemist who hasledalab > 
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> for three years at San Diego State University 
in California. “But when I started my own lab, 
there were other things that I just had no idea 
how to do.” Juggling the challenges of teaching 
and administrative duties while guiding the 
members of his lab was a mixture for which he 
had not been prepared. 

Graduate students, junior researchers and 
their institutions have been awakened to the fact 
that, early in their careers, they need to develop 
the interpersonal skills that lab leaders require. 
“Over the past ten years, the interest in learn- 
ing management as scientists has gone from 
a trickle to a small stream,” says Carl Cohen, 
an executive coach for scientists who, in 2011, 
helped to start the leadership programme that 
Kumar attended at the Cold Spring Harbor 
Laboratory. In fact, a number of institutions 
have launched workshops and seminars to teach 
management to postdoctoral researchers and 
junior faculty members (see ‘Learn to lead). 

One reason for the increase in management- 
training options for early-career researchers is 
that although universities are producing more 
researchers, many will not remain in academia. 
Former trainees often enter fields in which 
management skills comprise a significant com- 
ponent of their jobs. “Students and their PIs 
know that they may not have the same careers,” 
says Cohen, who taught and led research in 
molecular haematology at Tufts University 
in Medford, Massachusetts, before holding 
executive positions at several biotechnology 
companies. 


AVOID CONFLICT 

Academic scientists have also realized the 
importance of good management for success. 
For example, it is easier to attract talented 
researchers to a lab that has no conflicts, points 
out Markus Seeliger, who leads a cancer and 
ageing research group at Stony Brook School 
of Medicine in New 


York. Junior faculty “Over the past 

members can high- tenyears, 

light this selling point the interest 

to potential recruits, inlearning 

who might other- management 

wise want to work as scientists 

for more established has gone froma 

researchers. trickle to a small 
Kathy Barker, a stream.” 

microbiologist turned 


author and management consultant in Seattle, 
Washington, has noticed that an increasing 
number of scientists now mentor each other and 
address the cultural and interpersonal aspects 
of science. “In the first lab I worked in, no one 
talked to me for three days because I asked 
the wrong person how to use the autoclave,” 
recalls Barker, who in 2001 published At the 
Helm (Cold Spring Harbor Laboratory Press), 
a management guidebook for inexperienced 
Pls. Her experience spurred her to write about 
the importance of management and crafting 
a comfortable culture in which to do science. 


LEARN TO LEAD 


Management resources abound 


Management science has existed for 

more than a century. In 1911, engineer 
Frederick Taylor outlined the principles 

of ‘scientific management’, which aims 

to improve productivity in the workplace 
through collaboration. Management 
resources for early-career researchers are 
increasing. Here are a few. 

@ The Leadership in Bioscience workshop at 
the Cold Spring Harbor Laboratory in New 
York runs for 3.5 days every February or 
March. Aimed at postdoctoral researchers 
who are about to take leadership of a lab, as 
well as early-career principal investigators, 
the workshop accepts around 25 students, 
from a pool of about 40 applicants. 

@ The European Molecular Biology 
Organization (EMBO) in Heidelberg, 
Germany, holds a comprehensive series 

of workshops for early-career scientists. 
When they began in 2005, the workshops 
were offered only five or six times a year. 
Now, they take place 20 times a year, with 


These days, many institutions pay attention to 
making their labs more welcoming, she says. 

The field of research, number of members 
and culture of each lab bring their own predica- 
ments for new PIs. “Issues can be quite different 
depending on whether you are working ina nar- 
row field versus a field with lots of collaborative 
projects,” says Justin Cotney, a developmen- 
tal biologist at the University of Connecticut 
Health Center in Farmington. In small labs, 
interpersonal relationships between PIs and 
lab members are often more important — and 
potentially thorny — than in larger labs. Because 
PIs are able to spend more time and work more 
closely with postdocs and students in a small 
group, issues such as a communication problem 
or something not working are harder to ignore. 

PIs can help by setting expectations and 
developing lab protocols that make negative 
feelings less likely to crop up. A month or two 
after setting up his lab at Georgetown University 
Medical Center in Washington DC, neuroscien- 
tist Patrick Forcelli received complaints from his 
disgruntled lab manager, who was upset about 
mess left in the lab and incomplete paperwork. 
Forcelli has since assigned a specific responsibil- 
ity for lab upkeep to each member of his group, 
and devotes the beginning of the lab’s weekly 
meetings to reviewing whether tasks have been 
completed. Making lab members accountable to 
each other has united everyone behind a shared 
standard — and has also made the lab a nicer 
place to work. 

But sometimes the problems are not so easy 
to fix. As in any other workplace, the personali- 
ties and moods of individuals affect the overall 
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each workshop of 16-20 participants filling 
quickly. There is a waiting list for EMBO’s 
lab-management courses for principal 
investigators and postdoctoral researchers. 
@ The Uk-based Vitae online resource 
offers career-development advice for 
researchers. Registered members around 
the world can access tools to learn about 
conflict management and coaching for 
researchers, as well as other areas of 
professional growth. 

@ The Jackson Laboratory in Bar Harbor, 
Maine, offers a course called The Whole 
Scientist, which helps graduate and 
postdoctoral researchers to make the 

leap from acolyte to doyen. Georgetown 
University in Washington DC holds a similar 
course for early-career researchers. 

@ And this year, the Van Andel Research 
Institute in Grand Rapids, Michigan, began 
a series of workshops in leadership and 
management skills for scientists that it 
plans to continue yearly. B.D. 


lab environment. PIs must be attuned to how 
each member behaves in and perceives the work 
environment. “Knowing the people you work 
with and figuring out what each member of the 
lab will respond to helps you to know when a 
conflict might arise or escalate; says Cotney. 
He learned the lesson firsthand while he was a 
postdoc. When a colleague who had been strug- 
gling with personal issues snapped at a newjun- 
ior researcher, Cotney stepped in to defuse the 
tension. He reminded his colleague not to direct 
unreasonable anger at another lab member. “It 
was good to be proactive, and is something I do 
asa PI” Forcelli says that in small labs, it is espe- 
cially important for PIs to play an active part in 
handling conflicts. “I've seen cases where the PI 
will just be hands-off, which makes the environ- 
ment miserable for several people in the lab for 
an indefinite period of time,’ he adds. 

Kumar thinks that training can help 
researchers to appreciate the importance of 
good management. He says that the work- 
shop he attended helped him to better under- 
stand his role and responsibilities. For PIs like 
Kumar, it can be a relief to know that they can 
learn discrete skills for resolving management 
challenges. Perhaps the most important lesson 
is learning to view difficulties as normal and 
tractable. “One thing I take away is that its OK 
that something falls through — that you don't 
have to be perfect all the time. You realize that 
everybody is facing these things,” says Cotney. 
“Tt’s nice to know youre not alone.” m 


Boer Deng, a former Nature intern, is the 
Washington DC correspondent for The Times. 
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ONE SLOW STEP FOR MAN 


BY S. R. ALGERNON 


reetings, Mission Commander! 
(Ss are things back on Earth? 

I wish our first transmission 
from the a-Centauri system could bring 
better news, but I’m sorry to say that 
Captain Thurgood did not survive 
the trip. Something happened to the 
CO, filters, ’'m afraid. The rest 
of the crew died as well, some 
sooner than others. That’s 
what the logs say, anyway. 
As for the details ... well, you 
might as well ask me what caused 
the fall of the Roman Empire. 
Questions on that scale don't con- 
cern us much anymore. 

In case you were wondering, no, 'm 
not part of the crew. In fact, it took 
years to drift over to the communi- 
cations console and considerably 
longer to figure out how to senda 
transmission. 

You probably don’t know me. We 
might have passed in the hall back at the 
lab, before your project squeezed ours out 
entirely. Once, my name used to be on your 
office door. 

We designed nanocomputer components. 
Ours were the best in the business, an order 
of magnitude smaller than anybody else's. 
That turned out to be the problem. Comput- 
ers were small enough, now, the committee 
said. Once you can squeeze a petabyte onto 
a grain of sand, they said, you can do just 
about anything. Humanity has no need for 
computers that small. 

I knew I could prove them wrong, if I 
gave it a little thought. That was when I read 
your press releases and noticed the biologi- 
cal samples that were part of your interstellar 
payload. I searched the Internet for ‘tardi- 
grade; and I saw my chance. 

Dr Ehrlinger had found work on your 
team monitoring the life signs of the bio- 
logical samples ... for a substantial pay cut, 
I should add. I met her for drinks in a diner 
across from the launch facility, and the plan 
fell into place. 

Tardigrade means ‘slow-stepper’.. Some 
people prefer ‘water bear, maybe because 
they don’t want anybody to think they’re 
slow, but they’re in no hurry. They’ve been 
ambling about for half-a-billion years or 
so. Humans don't faze them one bit. Tardi- 
grades can survive just about anywhere, 
even in deep space. 


Survival instinct. 


Tardigrades are a millimetre or so in 
length, so I don't have quite the same stature 
I once did. I’ve grown, though, in my own 
way. 

Maybe I lied before. Maybe I’m not the 
same person whose office you poached, not 
exactly. I’m a neurocognitive simulation that 
fits inside a 0.1-mm brain case. I might not 
have all the wetware of the original, but I've 
got it where it counts. The human brain has 
a lot of redundancy. It's amazing what you 
can do when you really get serious about 
compression. We even had enough room for 
alittle 3D printer on one end, for enhance- 
ments and self-replication. 

My entire team is here, including Dr Ehr- 
linger, at least in their computerized forms. 
Our human versions are still on Earth, on 
some quiet little island, somewhere where 
they don’t extradite. It wasn’t too hard to 
smuggle ourselves aboard with our tardi- 
grade hosts. Once we trained them to move 
the way we wanted them to, we had the run 
of the ship. 

I know what youre thinking. We called 
you to gloat about tanking your mission. 
None of us are pilots. The ship is going to 
burn up on re-entry anyway, so who cares 

if it's infested by a 


> NATURE.COM bunch of vindictive 
Follow Futures: machines and way- 
Y @NatureFutures ward ‘bugs’? 


Ei go.nature.com/mtoodm © We're fine with 


128 | NATURE | VOL 527 | 5 NOVEMBER 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


that, actually. We can take it. 
Tardigrades can handle vacuum, radia- 
tion, desiccation, heat and just about 
anything else. Besides, we've 
figured out how to manipulate 
the somatic and germline tis- 
sues of our hosts. We've been 
pushing them to reproduce 
and spurring their evolu- 
tion. It’s thrilling, actually, to 
herd the sperm and egg towards 
each other, creating just the right 
offspring, and then to bury one of 
our newly replicated brains in the 
developing embryo. Dr Ehrlinger 
has a knack for genetics, and we 
don't need our human bodies 
any more to appreciate the joys 
of reproduction. 

If you think the tardigrades were 
hardy before, you ain't seen nothing yet. As 
long as we can manage to point the nose- 
cone somewhere in the neighbourhood of 
a planet, most of us will get through with 
barely a hiccup. 

Isn't that great news? You can tell every- 
one at Mission Control that you've suc- 
ceeded beyond your wildest expectations. 
You can take all the credit if you want. All 
that matters is that we have a home now, and 
a sense of purpose, and a plan for the future. 
We never could have done it without you. 

In fact, most of us don't even hold a grudge 
any more. I have to admit that my program- 
ming was crude at the outset, and revenge 
was a fixation of mine. Our machine-learn- 
ing algorithms and the chunks of code we've 
swapped with each other over countless gen- 
erations have broadened our horizons. 

I like to think that we’ve grown as far 
beyond you in the past few decades as you 
have in the past hundred million years of 
evolution. Maybe I’m underestimating you, 
though. When you get here, we'll find out 
who’ smarter than whom. 

Take your time. Slow and steady. That’s 
the tardigrade way. 

We're a patient lot. When you do arrive, 
you'll find us rather laid-back and demo- 
cratic. One sentient organism, one vote, 
and all that. 

Just don't be surprised if by then we out- 
number you by a trillion or so to one. m 


S. R. Algernon studied fiction writing 

and biology, among other things, at the 
University of North Carolina at Chapel Hill. 
He currently lives in Singapore. 
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base pairs of ahuman genome than to do a brain scan. But 
how does all that genomic data translate into treatment? 

Life scientists are bringing together astonishing volumes of 
information from genomic sequencing, lab studies and patient 
records. And the resulting era of ‘precision medicine is already 
delivering treatments tailored to individual needs. 

These ‘big data efforts face huge challenges, from creating 
analytic tools and solving scientific puzzles to accessing 
millions of gigabytes of data and overcoming barriers to 
accessing patients’ health records (see pages S2 and S19). 

Dozens of international projects are producing huge 
amounts of biomedical information, not just on the genome 
but on many other ‘-omes’ (S8). Giant strides are being made 
in mapping the human proteome and building a ‘parts list’ of 
the body (S6). Meanwhile, smartphones and other wearable 
devices are generating continuous flows of health data from 
large numbers of people (S12). This vast array of data will allow 
a more detailed understanding of disease traits in analyses 
known as deep phenotyping ($14). Research organizations 
are assembling cloud-based ‘information commons’ to 
standardize, store and share the data (S16). 

Drug companies are facing complex choices (S18). Many are 
opting to treat cancer, a main thrust in national programmes 
such as the UK 100,000 Genomes Project (S5). And some of 
these therapies are already changing clinical practice (S10). 

We are pleased to acknowledge that this Outlook was 
produced with support from the National Center for Protein 
Sciences—Beijing, Beijing Proteome Research Center, State 
Key Laboratory of Proteomics, China Human Proteome 
Organization, Beijing Institute of Radiation, and the Academy 
of Military Medical Sciences. As always, Nature retains sole 
responsibility for all editorial content. 
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BIG DATA 


The power.o of petabytes 


Researchers are belt to analyse the steadily swelling troves of ‘-omic’ data in the 
quest for patient-centred health care. 


BY MICHAEL EISENSTEIN 


ifteen years ago, it was a landmark 
Fiisicsnene Ten years ago, it was an 

intriguing but highly expensive research 
tool. Now, falling costs, soaring accuracy anda 
steadily expanding base of scientific knowledge 
have brought genome sequencing to the cusp of 
routine clinical care. 

A growing number of institutions are con- 
ducting genome-wide ‘dragnet’ searches to 
identify the mutations responsible for rare dis- 
eases. “The rate at which we're finding causative 
variants in those cases is going up,’ says Russ 
Altman, a bioinformatician at Stanford School 
of Medicine in California. “At some centres, it’s 
up to 50% of cases.” Genomic variants can also 
reveal ‘driver’ mutations that might reveal a 
tumout’s therapeutic vulnerabilities, or provide 
clues to whether a specific individual may or 
may not respond to a drug — the drug’s ‘phar- 
macogenetic’ properties. 

The US$1,000 genome, initially conceived 
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as a price point at which sequencing could 
become a component of personalized medi- 
cine, has arrived. “Our capacity for data gen- 
eration relative to price has increased in a 
way that is almost unprecedented in science 
— roughly six orders of magnitude in the past 
seven or eight years,” says Paul Flicek, a special- 
ist in computational genomics at the European 
Molecular Biology Laboratory’s European 
Bioinformatics Institute in Cambridge, UK. 
The HiSeq X Ten system developed by Illu- 
mina of San Diego, California, can sequence 
more than 18,000 human genomes per year, 
for example. 

The biomedical research community is div- 
ing in whole-heartedly, with population-scale 
programmes that are intended to explore 
the clinical power of the genome. In 2014 
the United Kingdom launched the 100,000 
Genomes Project, and both the United States 
(under the Precision Medicine Initiative) and 
China (in a programme to be run by BGI of 
Shenzhen) have unveiled plans to analyse 
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genomic data from one million individuals. 

Many other programmes are under way 
that, although more regional in focus, are still 
‘big data operations. A partnership between 
Geisinger Health System, based in Danville, 
Pennsylvania, and biotech firm Regeneron 
Pharmaceuticals of Tarrytown, New York, for 
instance, aims to generate sequence data for 
more than 250,000 people. Meanwhile, a grow- 
ing number of hospitals and service providers 
worldwide are sequencing the genomes of peo- 
ple with cancers or rare hereditary disorders 
(see ‘DNA sequencing soars’). 

Some researchers worry that the flood of 
data could overwhelm the computational 
pipelines needed for analysis and generate 
unprecedented demand for storage — one 
article estimated that the output from genom- 
ics may soon dwarf data heavyweights such 
as YouTube. Many also worry that today’s big 
data lacks the richness to provide clinical value. 
“I don't know if a million genomes is the 
right number, but clearly we need more than 
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we've got; says Marc Williams, director of the 
Geisinger Genomic Medicine Institute. 


THE MEANING OF MUTATIONS 

Clinical genomics today is largely focused on 
identifying single-nucleotide variants — indi- 
vidual ‘typos’ in the genomic code that can dis- 
rupt gene function. And rather than looking 
at the full genome, many centres focus instead 
on the exome — the subset of sequences con- 
taining protein-coding genes. This reduces the 
amount of data being analysed nearly 100-fold, 
but the average exome still contains more than 
13,000 single-nucleotide variants. Roughly 2% 
of these are predicted to affect the composition 
of the resulting protein, and finding the culprit 
for a given disease is a daunting challenge. 

For decades, biomedical researchers have 
dutifully deposited their discoveries of single- 
nucleotide variants in public resources such 
as the Human Gene Mutation Database, run 
by the Institute of Medical Genetics at Cardiff 
University, UK, or dbSNP, maintained by the 
US National Center for Biotechnology Infor- 
mation. However, the effects of these muta- 
tions were often determined from cell culture 
or animal models, or even theoretical pre- 
dictions, providing insufficient guidance for 
clinical diagnostic tools. “In many cases, asso- 
ciations were made with relatively low levels of 
evidence,’ says Williams. 

The situation is even more complicated for 
structural variants, such as duplicated or miss- 
ing chunks of genome sequence, which are far 
more difficult to detect with existing sequenc- 
ing technologies than single-nucleotide vari- 
ants. At the whole-genome scale, each person 
has millions of variants. Many of these are in 
sequences that do not encode proteins but 
instead regulate gene activity, so they can still 
contribute to disease. However, the extent and 
function of these regulatory regions are poorly 
defined. Although capturing all this variability 
is desirable, it may not offer the best short-term 
returns for clinical sequencing. “You're shoot- 
ing yourself in the foot if youre collecting data 
you don't know how to interpret,’ says Altman. 

Efforts are now under way to rectify this 
problem. The Clinical Genome Resource, 
which was set up by the US National Human 
Genome Research Institute, is a database of 
disease-related vari- 


ants, and contains ~° You're shooting 


information that yourselfinthe 
could guide medical footif you're 
responses to these collecting 
variants as wellasthe dafayou don’t 
evidence supporting [snow how fo 
those associations. interpret.” 


Genomics England, 

which runs the 100,000 Genomes Project, aims 
to bolster progress in this area by establishing 
‘clinical interpretation partnerships’: doctors 
and researchers will collaborate to establish 
robust models of diseases that can potentially 
be mapped to specific genetic alterations. 


DNA SEQUENCING SOARS 
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Human genomes are being sequenced at an ever-increasing rate. The 1000 Genomes Project has 
aggregated hundreds of genomes; The Cancer Genome Atlas (TGCA) has gathered several thousand; and 
the Exome Aggregation Consortium (ExAC) has sequenced more than 60,000 exomes. Dotted lines show 


three possible future growth curves. 
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However, quantity is as important as quality. 
Mutations that offer a strong detrimental effect 
bring an evolutionary disadvantage, so they tend 
to be exceedingly rare and require large sample 
sizes to detect. Establishing statistically mean- 
ingful disease associations for variants with weak 
effects also needs large numbers of people. 

In Iceland, deCODE Genetics has demon- 
strated the power of population-scale genomics, 
combining extensive genealogy and medical- 
history records with genome data from 150,000 
people (including 15,000 whole-genome 
sequences). These findings have allowed 
deCODE to extrapolate the population-wide 
distribution of known genetic risk factors, 
including gene variants linked to breast cancer, 
diabetes and Alzheimer’s disease. 

They have also enabled studies in humans 
that normally require the creation of genetically 
modified animals. “We have established that 
there are about 10,000 Icelanders who have loss- 
of-function mutations in both copies of about 
1,500 different genes,” says Kari Stefansson, the 
company’s chief executive. “We're putting sig- 
nificant effort into figuring out what impact the 
knockout of these genes has on individuals.” 

This work was helped by the homogeneous 
nature of the Icelandic population, but other 
projects require a broadly representative spec- 
trum of donors. Efforts such as the interna- 
tional 1000 Genomes Project have catalogued 
some of the world’s genetic diversity, but most 
data are heavily skewed towards Caucasian 
populations, making them less useful for 
clinical discovery. “Because they come from 
the genetic mother ship, so to speak, people of 
African ancestry carry a lot more genetic vari- 
ants than non-Africans,” says Isaac Kohane, a 
bioinformatician at Harvard Medical School 
in Boston, Massachusetts. “Variants that seem 
unusual in Caucasians might be common in 
Africans, and may not actually cause disease.” 

Part of the problem stems from the refer- 
ence genome — the yardstick sequence by 


Projection 


Double every 7 months (historical growth rate) 
Double every 12 months (Illumina estimate) 


which scientists identify apparent abnormali- 
ties, developed by the multinational Genome 
Reference Consortium. The first version was 
cobbled together from a few random donors 
of undefined ethnicity, but the latest iteration, 
known as GRCh38, incorporates more infor- 
mation about human genomic diversity. 


INTO THE CLOUD 
Harvesting genomes or even exomes at the 
population scale produces a vast amount of 
data, perhaps up to 40 petabytes (40 million 
gigabytes) each year. Nevertheless, raw stor- 
age is not the primary computational concern. 
“Genomicists are a tiny fraction of the people 
who need bigger hard drives,” says Flicek. 
“T don't think storage is a significant problem” 
A greater concern is the amount of variant 
data being analysed from each individual. 
“The computation scales linearly with respect 
to the number of people,’ says Marylyn Ritchie, 
a genomics researcher at Pennsylvania State 
University in State College. “But as you add 
more variables, it becomes exponential as you 
start to look at different combinations.” This 
becomes particularly problematic if there are 
additional data related to clinical symptoms or 
gene expression. Processing data of this mag- 
nitude from thousands of people can paralyse 
tools for statistical analysis that might work 
adequately in a small laboratory study. 
Scaling up requires improvisation, but there 
is no need to start from scratch. “Fields like 
meteorology, finance and astronomy have 
been integrating different types of data for a 
long time,” says Ritchie. “I've been to meetings 
where I talk to people from Google and Face- 
book, and our ‘big data’ is nothing like their 
big data. We should talk to them, figure out 
how they've done it and adopt it into our field” 
Unfortunately, many talented program- 
mers with the skills to wrangle big data sets 
are lured away by Silicon Valley. Philip Bourne, 
associate director for data science at the US 
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National Institutes of Health (NIH), believes 
that this is partly due to a lack of recognition 
and advancement within a publication- 
driven system of scientific credit that leaves 
software creators and data managers out in 
the cold. “Some of these people truly want to 
be scholars, but they can’t get the stature of 
faculty — that’s just not right,’ says Bourne. 

Processing power is another limiting fac- 
tor. “This is not a desktop game — the real 
practitioners are proficient in massively par- 
allel computation with hundreds if not thou- 
sands of CPUs, each with large memory,’ says 
Kohane. Many groups that analyse massive 
amounts of sequence data are moving to 
‘cloud’-based architectures, in which the data 
are deposited within a large pool of computa- 
tional resources and can then be analysed with 
whatever processing power is required. 

“There’s been a gradual evolution towards 
this idea that you bring your algorithms to the 
data,” says Tim Hubbard, head of bioinfor- 
matics at Genomics England. For Genomics 
England, this architecture is contained in a 
secure government facility, with strict control 
over external access. Other research groups are 
turning to commercial cloud systems, such as 
those provided by Amazon or Google. 


PRIVACY PROTECTION 

In principle, cloud-based hosting can encourage 
sharing and collaboration on data sets. But reg- 
ulations on patient consent and privacy rights 
surrounding highly sensitive clinical informa- 
tion pose tricky ethical and legal issues. 

In the European Union, collaboration is 
impeded by member states having different rules 
on data handling. Sharing with non-EU nations 
relies on cumbersome mechanisms to estab- 
lish adequacy of data protection, or restrictive 
bilateral agreements with individual organiza- 
tions. To help solve this problem, a multinational 
coalition, the Global Alliance for Genomics and 
Health, developed the Framework for Respon- 
sible Sharing of Genomic and Health-Related 
Data. The Framework includes guidelines on 
privacy and consent, as well as on accountabil- 
ity and legal consequences for those who break 
the rules. 

“In data-transfer agreements, you could save 
yourself pages and pages of rules if the institu- 
tion, researcher and funder agree to follow the 
Framework,’ says Bartha Knoppers, a bioethi- 
cist at McGill University in Montreal, Canada, 
who chairs the Alliance’ regulatory and ethics 
working group. The Framework also calls for 
‘safe havens that allow the research community 
to analyse centralized banks of genomic data 
that have been identity-masked but not fully 
‘de-identified; so they remain useful. “We want 
to linkit to clinical data and to medical records, 
because we're never going to get to precision 
medicine otherwise, so we're going to have to 
use coded data,’ explains Knoppers. 

Integrating genomics into electronic health 
records is becoming increasingly important for 


Rapid advances in technology are transforming genomics research. 


many European nations. “Our objective is to put 
this into the standard National Health Service,’ 
says Hubbard. The UK 100,000 Genomes Pro- 
ject may be the furthest along at the moment, 
but other countries are following. Belgium 
recently announced an initiative to explore 
medical genomics, for example. 

All these nations benefit from having cen- 
tralized, government-run health-care systems. 
In the United States, the situation is more frag- 
mented, with different providers relying on 
distinct health-record systems, supplied by dif- 
ferent vendors, that are generally not designed 
to handle complex genomic data. The NIH 
launched the Electronic Medical Records and 
Genomics (eMERGE) Network in 2007 to 
define best practices. 


FROM DATA TO DIAGNOSIS 

The immediate goal of genomically enriched 
health records is to explain the implications 
of gene variants to physicians, and one of its 
earliest implementations is pharmacogenetics. 
The Clinical Pharmacogenetics Implementa- 
tion Consortium has translated known drug- 
gene interactions reported in PharmGKB (a 
database run by Altman and his colleagues) for 
clinical use. For example, people with certain 
variants may respond poorly to particular anti- 
coagulants, leading to increased risk of heart 
attack. “The issue there is, how do you take a 
practitioner who has 12 minutes per patient 
and about 45 seconds of time allocated for pre- 
scribing drugs, and influence their practice in 
a meaningful way?” says Altman. 

As long as deciding how to adapt care to 
genetic findings remains a job for humans, 
this process will remain time- and labour- 
intensive. Nevertheless, combining genotype 
and phenotype information is proving fruit- 
ful from a research perspective. Most clinically 
relevant gene variants were identified through 
genome-wide association studies, in which 
large populations of people with a given disease 
were examined to identify closely associated 
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genetic signatures. Researchers can now work 
backwards from health records to determine 
what clinical manifestations are prevalent 
among individuals with a given genetic variant. 

And the genome is only part of the story — 
other ‘-omes’ may also be useful barometers 
of health. In July, Jun Wang stepped down as 
chief executive of BGI to start up an organiza- 
tion to analyse BGI’s planned million-genome 
cohort alongside equivalent data sets from the 
proteome, transcriptome and metabolome. “I 
will be initiating a new institution to focus on 
using artificial intelligence to explore this kind 
of big data,” he says. 


IT TAKES PATIENTS 

As researchers strive to integrate data from 
health records and clinical trials with genomic 
and other physiological data, patients are 
starting to contribute. “When we're focused 
on things like behaviour, nutrition, exercise, 
smoking and alcohol, you can’t get better data 
than what patients report,’ says Ritchie. 

Wearable devices, such as smartphones 
and FitBits, are collecting data on exer- 
cise and heart rate, and the volume of such 
data is soaring (see ‘page S12) as it can 
be gathered with minimal effort on the 
wearer's part. 

Each patient may become a big-data pro- 
ducer. “The data we generate at home or in the 
wild will vastly exceed what we accumulate in 
clinical care,’ says Kohane. “We're trying to cre- 
ate these big collages of different data modali- 
ties — from the genomic to the environmental 
to the clinical — and link them back to the 
patient.” As these developments materialize, 
they could create computational crunches that 
will make today’s ‘big data’ struggles seem like 
pocket-calculator problems. And as scientists 
find ways to crunch the data, patients will be 
the ultimate winners. = 


Michael Eisenstein is a freelance science 
writer based in Philadelphia, Pennsylvania. 
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KATE WITKOWSKA/GENOMICS ENGLAND 


Q&A Mark Caulfield 
National genomics 


Mark Caulfield is chief scientist at Genomics England, which was set up in 2013 to deliver 

the UK 100,000 Genomes Project, initially focusing on cancers, rare diseases and infection. 
Caulfield, a cardiovascular clinician and researcher, spoke about the UK approach to big data 
in biomedicine and the role of Genomics England — including how it plans to embed genomic 
medicine in Britain’s National Health Service (NHS). 


What are the main challenges to integrating 
genomic medicine into clinical practice? 

The first challenge was to establish a plat- 
form that provides the capability and capac- 
ity to deliver the programme. To that end, 
we established 11 genomic-medicine centres 
across England. These are focused groups of 
clinicians, scientists and academics that ena- 
ble us to engage patients, enrol them, receive 
informed consent, and capture clinical data 
and samples to analyse. 

Another important issue is how to drive 
up the quality of the interpretation of those 
genomes. In partnership with the United 
Kingdoms innovation agency, Innovate UK, 
we have spent £10 million (US$15.5 million) of 
government money on stimulating companies 
to improve the quality of analysis. In Decem- 
ber 2014, we instituted a programme called 
the Genomics England Clinical Interpretation 
Partnership (GeCIP), which brings together 
researchers, clinicians and trainees from both 
the NHS and academia to improve the analysis 
of genomic data. The 


GeCIP covers spe- NATURE.COM 
cific domains. For Moreonthe UK100,000 
example, we already Genomes Project here: 


have one covering — go.iafure.com/ri9rn5 


haematological oncology, which comprises 
all the people who work on leukaemia and 
lymphoma in the United Kingdom. 


How will your interpretations of the data feed 
into the health-care system? 

If there is an immediately actionable find- 
ing, such as a known pathogenic variation in 
a patient's genome, we send a clinical report 
directly to the appropriate NHS Genomic 
Medicine Centre. Clinicians then look at the 
data and perform their own validation steps to 
decide whether they think it is correct, before 
feeding it back to the patient. 

But that decision is always with the NHS. 
This is about creating a genomically enabled 
community of people who are looking at this 
data, are familiar with it, and are ‘owning’ the 
decision, as they would in the everyday clinical 
care of those patients. Embedding this auton- 
omy in the NHS will allow us to build a last- 
ing legacy after the initial Genomics England 
programme has finished. 

If we dont find anything that is obviously 
pathogenic, those genomes go off to the GeCIP 
domain relating to that patient's illness. This 
helps to drive up the accuracy of interpreting 
genomic information concerning the disease. In 
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the meantime, we send reports to the patients, 
updating them on the progress of the work. 


What is the role of industry? 

Having a vibrant genomics industry is in the 
best interests of patients and our community, 
and of course the total wealth of the country. 

We have created a consortium of ten com- 
panies ranging from small companies involved 
in diagnostics or analytics, through to the very 
large. We have invited those companies into a 
pre-competitive partnership to look at the first 
5,000 genomes with us. 

By ‘pre-competitive; I mean that they work 
together, they analyse the data, but they do 
not own any of the outputs, such as intel- 
lectual property. Genomics England owns 
these on behalf of 


the UK taxpayer.So «The NHS allows 
if something came us to conjoin 

up with commercial academic 
potential, we would 

be willing to license hake deve 
that on behalf of the as 

UK taxpayer to third system. 


parties, thereby creat- 

ing the potential for the United Kingdom to 
draw inward investment in terms of realizing 
the potential of the resource. This also creates 
a framework for industry to come in and help 
shape the programme at the outset. 


To what extent do you depend on the NHS? 
Hospitals and universities in the United King- 
dom are all part of one NHS, which allows 
them to work together cohesively and share 
information freely — something that would 
not be possible in a highly competitive and 
fragmented environment. The NHS is a frame- 
work that operates at the level of the whole 
nation, and it is free at the point of delivery. So 
it makes a huge difference. 

The NHS allows us to conjoin academic 
researchers and the health-care system so 
that they can respond rapidly to each other’s 
needs — for example, the health-care system 
can receive requests to collect data and samples 
in real time and receive results back quickly. 


Do you expect this approach to dramatically 
speed up research? 

It takes an average of 17 years for discoveries to 
translate from the bench into having a health- 
care impact. We are seeking to do this in three 
years. You maximize your opportunity to do 
that if you juxtapose the health system and the 
researchers. For people who fund research, 
this is a hugely effective and efficient way of 
doing it. 

So I see this as a platform not just for a 
unique transformation of the UK health-care 
system, but as a model for health-care systems 
around the world. = 


INTERVIEW BY CLAIRE AINSWORTH 


This interview has been edited for length and clarity. 
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BIG DATA IN BIOMEDICINE 


Stanford researcher Michael Snyder analysed his own genome, RNA expression and protein production. 


PROTEOMICS 


High-protein 


research 


The effort to catalogue proteins goes deeper in a push to 
make genetics research deliver practical benefits. 


BY NEIL SAVAGE 


hen Michael Snyder used the tools 
of ‘-omics’ on himself, he was in 
for some surprises. Sequencing 


his genome, for instance, he discovered that 
he had a genetic predisposition for type 2 
diabetes, even though he did not have any of 
the standard risk factors, such as obesity or 
family history of the disease. Over the next 


14 months, Snyder, a molecular geneticist at 
Stanford University in California, repeatedly 
tested himself to monitor his RNA activity and 
protein production’. 

When he contracted a respiratory virus 
midway through the study, he watched as his 
protein expression changed and biological 
pathways were activated. Then he was diag- 
nosed with diabetes — it looked to him as if 
the infection had triggered the condition. He 
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also watched his proteins change during a bout 
of Lyme disease. 

“Thad no idea Id turn out to be interesting; 
says Snyder, whose body has produced half a 
petabyte (500,000 gigabytes) of data so far. “It 
was just a proof of principle.” 

He has since expanded his study to 100 
people, collecting measurements from the 
proteome and 13 other ‘-omes; including the 
proteome and transcriptome of the micro- 
organisms that inhabit their bodies. He hopes 
that he and others can collect these deep pro- 
files from a million patients, and apply the 
tools of big data to tease out differences that 
predict disease and provide a finer-grained 
understanding of various conditions. He also 
hopes that they can break conditions down 
into subtypes by their proteomic profiles. 
“There are probably 100 different types of 
diabetes,’ Snyder says. 

Snyder’s experience shows the power of 
using ‘-omics’ to improve our understanding 
of biology, says William Hancock, a protein 
chemist at Northeastern University in Boston, 
Massachusetts. 


> 


PRACTICAL GENETICS 

Genes provide the instruction manual for 
biological processes, but it is the proteins they 
create that turn those instructions into real- 
ity. Huge international efforts are under way 
to identify proteins, map their locations in tis- 
sue and cells, count how many are produced 
in particular circumstances, and describe the 
various forms they can take. And the oceans of 
data from these searches will uncover biomark- 
ers for diseases and provide targets for drugs 
to treat various conditions. By combining 
proteomics with genomics, transcriptomics, 
metabolomics and other ‘-omics’ scientists 
may further deepen their understanding of 
biology on a molecular level. 

Proteomics brings genetic information to 
a practical level, says Gilbert Omenn, a bio- 
informatician at the University of Michigan 
in Ann Arbor and chair of the global Human 
Proteome Project (HPP). The idea of the pro- 
ject is to create a “complete parts list” of the 
human body, he says, “to fill in the many blank 
spots between knowing that a gene has some- 
thing to do with a disease process and knowing 
how it really works”. 

That is quite a parts list. The human body 
contains roughly 20,000 genes that are capable 
of producing proteins. Each gene can produce 
multiple forms of a protein, and these in turn 
can be decorated with several post-transla- 
tional modifications: they can have phosphate 
or methyl groups attached, or be joined to 
lipids or carbohydrates, all of which affect their 
function. “The number 
of potential molecules 
you can make from one 
gene is huge,’ says Bern- 
hard Kiister, who stud- 
ies proteomics at the 


Finda review of how 
proteomics affects 
cell biology here: 


LEE ABEL/STANFORD DEPARTMENT OF GENETICS 


SHI, Q. ET AL. PROC. NATL ACAD. SCI. USA 109, 419-424 (2012). 


Technical University of Munich in Germany. 
“It’s very hard to estimate, but I wouldn't be 
surprised to have in one cell type 100,000 or 
more different proteins.” 


GLOBAL MAPPING 

Proteomics research is an international enter- 
prise. The Human Proteome Organization 
created two complementary HPP projects, 
both of which use mass spectrometry. One, 
the Chromosome-based HPP, divided the 24 
chromosomes among 19 countries. Japan, for 
example, is tackling chromosome 3 and the 
X chromosome, and Iran is studying the Y. 
The second, the Biology/Disease-driven HPP, 
is looking for proteins in specific tissues and 
organs, focusing on those that are relevant to 
diseases such as diabetes and colon cancer. A 
separate global project, the Human Protein 
Atlas, relies on antibodies with fluorescent 
molecules or other tags attached that bind to 
specific proteins to identify them. 

There are also some significant national 
efforts. China is investing heavily in proteom- 
ics research, with one example being a new 
national laboratory called PHOENIX, which 
was set to open in October with annual fund- 
ing of US$10 million. 

Whatever the technical approach, map- 
ping the human proteome is no easy task. The 
genome is simple in comparison — it is assem- 
bled with just four nucleic acids and changes 
little over a person's lifetime, except in the 
special case of cancer. Proteins, on the other 
hand, vary over time, changing during exer- 
cise, disease and menstrual cycles, for example. 
Another complication is that the most abun- 
dant protein can be about 10 billion times as 
common as the least. “You have one genome 
and you have a gazillion proteomes, depend- 
ing on the environmental situation,” says 
Hancock, who is co-chair of the Chromosome- 
based HPP. 

“There is no such thing as a human 
proteome in one person, let alone in many peo- 
ple,” says Kuster. Last year, his group published 
a draft map’ of a human proteome based on 
16,857 mass-spectrometry measurements of 
human tissue, cell lines and body fluids. They 
also created a database, ProteomicsDB, to pro- 
vide analysis of the data. 


TOO MUCH DATA? 

Just figuring out how to handle the volume of 
proteomics data is tough. The Human Protein 
Atlas, for instance, collects images of tissues 
with tagged antibodies. Each image takes up 
tens of megabytes, and compressed jpeg files 
about 10 megabytes in size are made available 
for online distribution. 

Meanwhile, the European Bioinformatics 
Institute (EBI) in Hinxton, UK, is creating 
ELIXIR, a distributed-computing infrastruc- 
ture designed to share proteomics and other 
biology data among research institutions in 
Europe. “ELIXIR doesnt want to create a huge 


database — they want to link different groups 
and different countries,” says Mathias Uhlén, 
a microbiologist at the KTH Royal Institute of 
Technology in Stockholm, Sweden. The EBI 
is already the repository for the Protein Iden- 
tifications (PRIDE) database, which collects 
mass-spectrometry data generated by multiple 
research groups. 

But scientists often disagree about whether 
to keep the raw data or throw it away. “The 
methods for identifying proteins from raw 
data are constantly improving, so it makes 
sense to keep the raw data if you can — but it 
does take lots of space,’ says Conrad Bessant, 
a bioinformatician at Queen Mary University 
of London. The argument on the other side, he 
says, is that “the field is advancing so quickly 
that why would you look at a five-year-old data 
set? You might as well run the analysis again, 
because the instruments are so much better” 


FILLING IN THE MAPS 

Proteome data are far from perfect, how- 
ever. In the issue of Nature last May in which 
Kiister’s group reported their results, another 
group of scientists from the United States and 
India published a draft map’ said to cover 
about 84% of the protein-coding genes in the 
human genome. Both maps were based on 
mass spectrometry: an enzyme digests pro- 
teins and produces peptide sequences about 
7 to 30 amino acids long, and the mass of these 
peptides is used to deduce the protein’s com- 
position. And both projects ended up reduc- 
ing the number of 
proteins they claimed 


“When it comes ise Oak 
to big data, Reel ree : . a 
35) 3 other scientists 
it’s easier to lled i : 
te the called into question 
a some of their inter- 
: ata . ia foget  pretations*. Mass 
sorecptn ge out spectrometry is a 
of it. probabilistic method, 


says Omenn, and 
there is no way to exclude the possibility that 
two different proteins produced the same pep- 
tide sequence. 

The Human Protein Atlas’s antibody-based 
detection, on the other hand, is non-probabil- 
istic, as it tags individual proteins. The advan- 
tage of this approach, argues Uhlén, one of the 
creators of the Atlas, is that it shows precisely 
in which organs, tissues and even cells the pro- 
teins are located. “What we are providing is a 
map of where the proteins are,” Uhlén says. 
“That gives you hints about the function of 
the proteins.” 

Recent years have seen a push to develop 
microfluidic chips on which to perform 
antibody-based single-cell proteomics. This 
approach is particularly important when the 
cells of interest are rare, as in the case of circu- 
lating tumour cells. It also allows investigators 
to study differences between populations of 
the same cell type. For example, if one tumour 
cell makes many more copies of a particular 
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A proteomics chip (top) profiles individually 
labelled cells in its microchambers (bottom). 


protein than its neighbour, or the proteins in 
one cell have a methyl group attached whereas 
those in another cell do not, this could explain 
how the tumour develops drug resistance, 
leading to possible targets for therapeutics. 

However, even the antibody approach has 
limitations, as some antibodies can bind to 
more than one protein, creating misleading 
results. “An even harder problem is knowing 
what data are of good quality and what are not,” 
says Uhlén. “When it comes to big data, it’s 
easier to generate the data than to get knowl- 
edge out of it” 

Then there are the missing proteins. 
Roughly 15% of human genes that should 
encode proteins have had no associated pro- 
tein identified° — that means there are nearly 
3,000 missing proteins. In some cases, this may 
be because they occur in small amounts or in 
only tiny areas of tissue. Without a complete 
catalogue of proteins, the overall picture of 
human proteomics remains fuzzy. 

Computing with incomplete or inaccurate 
data could lead researchers astray, Hancock 
worries. “Bringing biology and mathemat- 
ics together is a match made in hell,” he says. 
“Biology is wet and dirty and messy.” 

But as measurement techniques improve 
and scientists amass more findings, “the 
picture is going to get sharper and sharper,” 
Hancock adds. And the sheer volume of data 
available to sift through will continue to soar 
as measurement techniques improve. “We get 
all kinds of data from many different experi- 
ments,’ Bessant says. “It doesn’t take long until 
you get hundreds of gigabytes or terabytes 
of data” = 


Neil Savage is a freelance science and 
technology writer based in Lowell, 
Massachusetts. 
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COLLABORATIONS 


Mining the 
motherlodes 


Collaboration and competition are spurring on major 


‘omic’ projects. 


BY KATHERINE BOURZAC 


hen actress Angelina Jolie 
announced in 2013 that she’d had 
a double mastectomy to reduce her 


chance of developing breast cancer, after test- 
ing positive for a genetic risk factor, the BRCA 
genes responsible were all over the media. 
These genes carry a significant risk: 55-65% 
of women with a harmful BRCA 1 mutation, 
and 45% of women with a mutation in BRCA2, 
develop the disease by the age of 70. 

Jolie’s case involved a single gene, BRCA1, 
that markedly increased the risk of a specific 


disease, but the risks of developing genetic dis- 
eases are usually much more complicated than 
that. These complexities are being explored by 
the many huge research efforts that have been 
launched in recent years. 

Collaborations involving hundreds of 
scientists and computational biologists are 
starting to make sense of genomics, proteom- 
ics and a host of other 
‘-omics. Researchers are 


tracing the twists and Youcanread more 
turns as thousands of  aboutbioinformatics 
different forms of pro- competitions here: 


teins are churned out 
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and modified (see “‘High-protein research, 
page S6). They are mapping the molecular 
pathways that flow into or away from differ- 
ent diseases, and are examining the effects of 
other factors, such as bacteria, on the human 
body (microbiomics). They are building and 
testing algorithms to predict how all these 
‘“-omic’ signatures connect to human health. 
And they are collaborating to share their 
ideas and keep each other on track (see ‘New 
eyes on the prize’). 

These large studies make it possible to iden- 
tify and focus on risk factors for particular 
diseases. This research, which should enable 
more personalized treatment for individual 
patients, is creating huge data sets. Finding 
rare variations in the genome — and being sure 
they are not missing something — means sift- 
ing through the three billion base pairs in the 
genomes of tens of thousands of volunteers. To 
make it work, clinicians from across the world 
are working with bioinformaticians and com- 
puter scientists on a grand scale. 

In the process, these researchers also are 
evolving the art and science of collaboration 
in the era of big data. 


THE QUEST FOR BURIED TREASURE 

A disease-focused approach to the genome 
often involves so-called genome-wide asso- 
ciation studies, which are particularly well 
established in cancer research. In breast cancer, 
for example, genome-wide association studies 
have revealed about 90 variants — ‘typos’ in 
the genomic code — that are associated with 
the disease. Of these, only five occur in parts 
of the genome that code for proteins, says 
Sara Lindstrém, a genetic epidemiologist at 
Harvard University in Boston, Massachusetts. 

The other 85 breast-cancer variants are 
mostly a mystery. “When you see one of these 
signals, it’s not clear if it increases disease 
risk, or if it’s just correlated with disease,” says 
Lindstrém. Sifting out the important variants 
requires knowledge of what all these parts of 
the genome do. 

One of the biggest resources for computa- 
tional biologists tasked with sorting genomic 
cause from correlation in such puzzles is the 
Encyclopedia of DNA Elements (ENCODE). 
Launched in 2003, ENCODE is a mam- 
moth collaborative project funded by the US 
National Human Genome Research Institute, 
which maintains a publicly available, search- 
able genome database. 

In 2012, 442 researchers in 32 labs jointly 
released ENCODE papers that connected 
more than 80% of the human genome to 
specific biological functions and identi- 
fied more than 4 million regions where 
proteins hook up with DNA (see J. R. Ecker 
et al. Nature 489, 52-55; 2012 and references 
therein). 

“If you have a favourite gene, you can look 
it up in ENCODE and find out what regions 
are likely to regulate that gene,” says Michael 


TATIANA PLAKHOVA 


Snyder, a Stanford University geneticist and 
one of the leaders of ENCODE. A breast-can- 
cer researcher, for instance, might find out that 
a genetic variation uncovered in an association 
study is a target for a particular transcription 
factor, a protein that regulates gene expression. 
That regulatory protein might then be a new 
target for therapy. 

Complementary approaches taken by 
researchers from 28 institutions are filling in 
this genome encyclopedia. Many participants 
study RNA, while some focus on transcription 
factors or on the regions of the genome where 
these regulatory elements attach. And still oth- 
ers carry out mapping and data analysis. 

Sometimes the sheer size of the ENCODE 
project can slow things down. A postdoc’s 
idea must be vetted by a larger group, for 
example, and sometimes researchers have to 
wait for other labs to finish their work before 
they can publish a paper, says Manolis Kellis, 
a computational biologist at the Broad Insti- 
tute in Cambridge, Massachusetts. 

But such problems are far outweighed by the 
benefits of working together, he says. When 
you work alone, “bugs can be introduced, and 
it often takes years to find them’ he says. That 
does not happen in ENCODE — mistakes are 
usually swiftly spotted by one ofa large group 
of colleagues. The collaborative structure also 
encourages standardization; researchers need 
to call a gene or regulatory element by the 
same name so that they can communicate, 
and so that the database is searchable and 
user-friendly. 


CANCER IN SEQUENCE 

This sort of standardization is essential when 
dealing with more complex data. The Interna- 
tional Cancer Genome Consortium (ICGC), 
set up in 2008, is trying to deal with this issue 
at the moment. 

The original goal of the project was to 
sequence the healthy and cancer genomes 
of 25,000 people. The initial sequencing 
efforts were performed only on the protein- 
coding parts of the genome. But consortium 
leader Tom Hudson, scientific director of 
the Ontario Institute for Cancer Research in 
Toronto, Canada, says that now the ICGC has 
collected about 2 petabytes (2 million giga- 
bytes) of data it plans to go much broader 
and deeper. 

The ICGC will now sequence the non- 
protein-coding parts of the genome that 
ENCODE specializes in, and include more 
clinical information about the patients. This 
Pan Cancer Analysis of Whole Genomes pro- 
ject will also bring in data from more people 
— the target is 250,000 — and sequence both 
their normal and cancer genomes. 

This scaling up in the size and scope of the 
project will be no mean logistical feat. So far 
the ICGC has brought together leaders from 
78 projects in 16 countries. In a pilot of the 
larger whole-genome comparison project, 
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NEW EYES ON THE PRIZE 


Competitions find different ways to solve problems. 


Tough problems often benefit from a fresh 
pair of eyes. That was the thinking in June, 
when the US National Cancer Institute (NCI) 
launched a competition called ‘Up fora 
Challenge’ to find new ways of analysing 
breast-cancer data sets. The NCI gathered 
data from several research groups and 

is supplying them to teams that present 

a reasonable proposal, agree to uphold 
privacy standards, and meet other criteria. 
The NCI has offered the winner a $30,000 
prize and the opportunity to publish a paper 
in PLoS Genetics. 

Judges will score entries according to 
how well groups use innovative methods to 
find new genetic variants associated with 
breast cancer, whether the findings can be 
replicated, and whether they are consistent 
with known cancer biology. The competition 
will give extra points to competing groups 
who formed new collaborations to work on 


researchers are analysing paired tumour and 
normal genomes from 2,600 people, which 
amounts to about 0.7 petabytes, says Jan Kor- 
bel, a computational biologist at the European 
Molecular Biology Laboratory in Heidelberg, 
Germany. This is large, but it is still possible 
to use academic computer centres to process 
the data. 
But the group is at a crossroads. They either 
need “vast investment” in academic data- 
centre infrastruc- 
ture for 250,000 


“Big data isnot genomes, says Kor- 
particularly bel, or they must 
usefulif you figure out how to 
don’t have use cloud comput- 
analytics ing for data sharing 
that you and analysis. “You 
can trust.” could have several 


clouds, each specific 

to a country, as long 
as those clouds can ‘talk’ with one another — 
that is, as long as comparative analyses of data 
in one cloud with data from another cloud are 
possible,” says Korbel. 


ANOTHER VIEWPOINT 

In efforts like these, standardizing data so that 
results from different groups are comparable 
and searchable maximizes the pool of infor- 
mation. This is important when hunting for 
rare variations that can only be spotted by ana- 
lysing genomic data from tens of thousands or 
hundreds of thousands of samples. Working 
together also helps researchers to strengthen 
their analyses, says Gustavo Stolovitzky, a 
computational biologist at IBM’s Thomas J. 
Watson Research Center in Yorktown Heights, 


the problem. “We want to reach beyond 
the usual suspects, and encourage a 
greater diversity of people to work on these 
problems,” says Elizabeth Gillanders, a 
genetic epidemiologist at the NCI. 

This is one of many competition- 
based biomedical data projects. Among 
the others is the DREAM Challenges 
programme, set up to improve algorithm 
development in systems biology by 
Gustavo Stolovitzky, a computational 
biologist at IBM in Yorktown Heights, New 
York. The programme has expanded to ask 
researchers to, for example, predict disease 
progression and the effectiveness of drug 
combinations in people with amyotrophic 
lateral sclerosis. 

In many cases, the best performers 
do not have a background in the specific 
biology involved. “Presented with a new 
data set, they shine,” says Stolovitzky. 8. 


New York. 

Although big-data analytics can reveal 
patterns and connections that are otherwise 
invisible, they can also support a researcher's 
pre-existing assumptions, thereby obscuring 
the truth. 

One common mistake is ‘overfitting: Stolo- 
vitzky likens this to preparing for a university 
entrance exam by memorizing a big stack of 
difficult vocabulary flashcards. You can study 
hard and memorize all the words and their 
definitions, but that does not mean those 
words will be on the test — and if they are, the 
test may use a different wording that throws 
you off. 

Similarly, researchers who devise a predic- 
tive algorithm based on their own data set tend 
to make an algorithm that is good at predicting 
the results of their own study but fails to work 
on different data. 

Another problem is simply human nature. 
“When we analyse our own work, we are very 
benign,” says Stolovitzky. It is more useful to 
involve others, who may have ideas that would 
never have occurred to someone staring at the 
same data set all day. 

“Big data is not particularly useful if you 
don’t have analytics that you can trust,” says 
Stolovitzky. “We’ve seen that if you aggre- 
gate the results of several algorithms — as 
long as none of them are bad — the whole is 
greater than the sum of the parts.” That's just 
one more example of how, when researchers 
want to get the best results from biomedical 
big data, working together is crucial. m 


Katherine Bourzac is a science journalist 
based in San Francisco, California. 
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Norman Sharpless of the University of North Carolina works with IBM Watson Health to analyse DNA data. 


| CANCER | 


Reshaping the 
cancer clinic 


Big data’s war on cancer is stillin the early stages, but the 


front line is advancing. 


BY CHARLIE SCHMIDT 


" | ‘es Cancer Genome Atlas, which cata- 
logues cancer mutations, contains some 
2.5 million gigabytes of data. This giant 
project, run by the US National Institutes of 
Health, has vastly improved our understanding 
of various forms of cancer — but it holds 
relatively little information on the clinical 
experience of the patients who supplied the 
samples. 
At the other end of the cancer treatment 
chain, electronic health records contain a 
wealth of case-specific information that could 


be used to improve cancer care. But more often 
than not, such records are isolated in individ- 
ual hospitals and medical practices. Asa result, 
“most patient experiences are lost to research’, 
says Clifford Hudis, an oncologist who spe- 
cializes in breast cancer at the Memorial Sloan 
Kettering Cancer Center in New York. 

In an effort to improve cancer treatment, 
Hudis and many others 
are now collaborating on 
efforts to bring together 
and make sense of the 
big data that emerge 
from research, patient 
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care and clinical trials. Opportunities for big 
data extend across most areas of medicine, but 
“cancer is leading the way’, says Lynn Ether- 
edge, a health-care consultant based in Chevy 
Chase, Maryland. But the ubiquity, variety and 
lethality of cancer mean that there are plenty of 
barriers as well as breakthroughs. 

Even so, Etheredge, who in 2007 wrote an 
influential article for Health Affairs calling for 
“rapid learning systems” to handle big data, 
believes we have entered a historic period for 
cancer research and treatment. “We know 
that cancer is a genetic disease, and we have 
the databases and the computational power 
needed to analyse them,” he says. 

Hoping to build on early successes with 
personalized cancer drugs, oncologists and 
computer specialists are working together to 
harness digitized information and apply it in 
the clinic. These emerging ventures are com- 
peting for business and are grappling with dif- 
ficult questions about privacy, data ownership 
and sustainable business models. “Big data is 
both a research tool and a proprietary com- 
modity,” Etheredge says. “It’s still early days 
in the field and there’s a lot that we need to 
work out.” 

Many organizations and approaches are 
bringing big data to the cancer clinic in the 
United States, which leads the world in some 
aspects of cancer treatment. Here we will con- 
sider four: a rapidly growing start-up company, 
a professional association's initiative, a com- 
puter giant’s cognitive computing and health- 
care wing, and a network of academic cancer 
centres. 


THE START-UP 

Launched in 2009 by scientists at the Broad 
Institute in Cambridge, Massachusetts, Foun- 
dation Medicine bills insurance companies for 
its analytical services. Academic and commu- 
nity oncologists submit patients’ tissue sam- 
ples, and Foundation Medicine sequences 
them. It then screens them for genomic can- 
cer drivers against its own growing database of 
molecular profiles (generated from more than 
50,000 cancer patients so far) and data from 
other public repositories. 

“The public databases aren't like Google — 
oncologists have no easy way to search them 
for genomic drivers that relate to their own 
patient’s tumour,” says Michael Pellini, chief 
executive of Foundation Medicine. “So we 
analyse the tissues and report back available 
therapeutic interventions, either in the form 
of a drug approved by the US Food and Drug 
Administration or a clinical trial.” 

Oncologists can also query Foundation 
Medicine’s client network for advice on dif- 
ficult cases. Within 72 hours, Pellini says, 
responses are aggregated and sent to the doc- 
tor, who can then gauge whether a particular 
drug or approach was effective. The company 
aims to make its client data more broadly avail- 
able for use in clinical decision-making. 
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In January 2015, Swiss pharmaceutical giant 
Roche spent US$1 billion on a 56% stake in 
Foundation Medicine, the largest corporate 
player in this sector, expecting revenue this 
year of more than $85 million. 


PRACTICE MAKES PERFECT 

In late 2015, the American Society of Clinical 
Oncology (ASCO) is expecting to launch Can- 
cerLinQ, a platform designed to deliver clini- 
cal benefits by analysing aggregated electronic 
health records from thousands of oncology 
practices. 

Oncologists will be able to interrogate 
CancerLinQ to see the effects of specific inter- 
ventions, to review how their own treatment 
approaches stack up against established care 
standards, and to develop hypotheses for fur- 
ther study. 

“Much of what we know about treating can- 
cer comes from clinical trials that enrol just 
3% of the patients diagnosed with cancer every 
year,’ says Hudis, who serves on CancerLinQ’s 
board of governors. “With CancerLinQ, we're 
trying to learn from the remaining 97% who 
don’t participate in these studies.” 

An initial group of 15 ‘vanguard practices’ 
of varying sizes are participating in the sys- 
tem, which ASCO expects to contain 500,000 
patient records by 2016. Researchers and 
clinicians will be able to query these records 
to compare patient outcomes by treatment. 
Aggregating such large amounts of data should 
help to reveal the effectiveness of particular 
drugs or approaches. 

“The most important thing that CancerLinQ 
can do is report on outcomes, for instance, that 
patients who received a particular treatment 
lived longer, or had slower progression of their 
disease,” says oncologist Robert Miller, medical 
director of ASCO’s Institute for Quality. These 
insights will benefit patient care and come at 
a time, he says, when Medicare, the leading 
US funder of cancer treatment, is shifting 
from fee-for-service reimbursement to alter- 
native payment models that reward better 
outcomes. 

A prototype of CancerLinQ was tested in a 
study of 170,000 breast-cancer patients in 2013. 
According to Miller, unpublished data showed 
that the system could highlight trends in data 
submitted by different medical practices — 
for example, how they stimulate the produc- 
tion of red blood cells to treat anaemia after 
chemotherapy. 

The platform extracts patient data from 
electronic health records, anonymizes and 
aggregates the data, and then integrates them 
with other types of information, including 
doctors’ notes and biomarker repositories. The 
goal is eventually to add point-of-care decision 
support to aid physicians with patients whose 
diagnosis and treatment is problematic. 

CancerLinQ currently relies on donations, 
but Miller says that in time it will sell effec- 
tiveness reports and data-exploration tools to 


make it more self-sustaining. “We are looking 
at a range of CancerLinQ-related products and 
services to help offset the operational costs of 
the system,’ says Miller. 


COGNITIVE COMPUTING 

Big data needs big computing, and in 2013 
IBM formed a separate business unit — IBM 
Watson Health — to focus on commercial 
opportunities in cancer for its Watson cogni- 
tive computing system, which combines natu- 
ral language and learning capabilities. Watson's 
store of biomedical knowledge includes every 
abstract in the PubMed database (there are 
currently about 25 million and counting); the 
US National Cancer Institute's Drug Diction- 
ary (which has data on both approved drugs 
and those in clinical trials); the entire catalogue 
of somatic cancer mutations in the COSMIC 
(Catalogue of Somatic Mutations in Cancer) 
database, which is curated by the Wellcome 
Trust Sanger Institute, in Cambridge, UK; and 
data from many other sources. 

Watson, which gained fame in 2011 by 
defeating human champions on the US tel- 
evision quiz show 
Jeopardy!, also has 


Mebane Li access to anonymized 
MAGEE: inQ, patient data. IBM 
we re trying Watson Health has 
to learnfrom relationships with 
the etaungs nis more than a dozen 
97% who don t medical practices, 
participate in cancer centres and 


these studies.” research organiza- 


tions, says Ajay Royy- 
uru, director of the Computational Biology 
Center at IBM Research in Yorktown Heights, 
New York. 

The New York Genome Center relies on 
Watson to screen DNA mutations in patients 
enrolled in a study of glioblastoma, an often 
fatal brain cancer. 

Physicians at the Memorial Sloan Ketter- 
ing centre and at the MD Anderson Cancer 
Center in Houston, Texas, are training Watson 
to become a clinical support tool, which entails 
presenting the computer with anonymized and 
hypothetical cases. For instance, a patient’s 
tumour might test positive for deficiencies in 
a gene called STK11 that may respond to the 
diabetes drug metformin, Royyuru explains. 
But Watson might not recommend metformin 
because this is an off-label indication. “That 
would be an instance in which it could be 
taught to cast a wider net,’ Royyuru says. 

Andrew Seidman, a breast-cancer specialist 
at the Memorial Sloan Kettering centre, adds 
that the use of Watson must be transparent, so 
that its reasoning can be easily critiqued. And 
Seidman cautions that Watson ist ready for 
prime time yet. “I'm taking a sober view, and I 
say that as someone who’ helping to develop 
the technology,’ he says. In particular, Wat- 
son's capacity for natural language processing 
remains a work in progress. For now, instead 
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of speaking to the computer directly, clinicians 
have to enter the data manually. 


NETWORK NEWS 

One of the major challenges facing cancer 
research is how to match patients with tar- 
geted drugs that act on rare mutations, because 
enrolling enough of these patients in clinical 
trials is not easy. But one group of hospitals has 
found a way to get round the problem. 

Launched in 2014 by the Moffitt Cancer 
Center, in Tampa, Florida, the Oncology 
Research Information Exchange Network 
(ORIEN) comprises nine academic cancer 
centres. Patients provide clinical data and tis- 
sue samples for analysis, and importantly agree 
to life-long follow-up, which allows patients 
to be recruited into new trials geared to their 
own genetic make-up. “It’s a much more pro- 
active way of doing research,” says Bill Dalton, 
ORIEN’s founding director. 

Moffitt developed the protocol, which it 
calls “total cancer care’, in 2003, and created 
a company — M2Gen — to handle the analy- 
ses and tissue storage in 2006. The develop- 
ment of ORIEN gives this protocol a national 
reach, with about 130,000 people enrolled so 
far. Member centres share clinical and molecu- 
lar data, so they can collaborate on research 
questions. 


BIG PRICE TAGS 

Extracting clinical insights from big data, and 
using them to guide treatments, does not come 
cheaply, however. For example, Foundation 
Medicine charges nearly $6,000 to sequence 
and interpret the data from a single solid 
tumour, and more than $7,000 for a blood 
cancer. 

But this is dwarfed by the cost of new oncol- 
ogy drugs, which often have price tags of 
more than $100,000 per treatment or per year. 
In July, US Medicare agreed to pay for a leu- 
kaemia drug from Amgen that will cost about 
$178,000 per patient. 

Other countries may bargain far more 
aggressively with drug companies to bring 
down prices, or reject the drugs altogether 
on a cost basis, through agencies such as the 
UK National Institute for Health and Care 
Excellence. 

Ideally, this big money will buy big gains 
in personalized treatments and cures. This 
is certainly the hope of the US Medicare and 
Medicaid officials confronted with spending 
more than $13 trillion on health care dur- 
ing the coming decade, much of it on cancer 
therapy. These agencies will wield enormous 
power over the practicalities of bringing big 
data into the clinic. Issues relating to data busi- 
ness models and costs will apply across all areas 
of medicine, “but cancer is forcing them to the 
table now’, says Etheredge. m 


Charlie Schmidt is a freelance science writer 
based in Portland, Maine. 
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Smartphone fitness apps enable researchers to gather health data from large numbers of people. 


MOBILE DATA 


Made to measure 


Wearable sensors and smartphones are providing a flood of 
information and empowering population-wide studies. 


BY NEIL SAVAGE 


have been using a simple test to meas- 

ure the cardiovascular health of patients. 
They ask them to walk on a hard, flat surface 
and see how much distance they cover in six 
minutes. This test has been used to predict the 
survival rates of lung transplant candidates, 
to measure the progression of muscular dys- 
trophy, and to assess overall cardiovascular 
fitness. 

The walk test has been studied in many tri- 
als, but even the biggest rarely top a thousand 
participants. Yet when Euan Ashley launched 
a cardiovascular study in March 2015, he col- 
lected test results from 6,000 people in the 
first two weeks. “That’s a remarkable number,’ 
says Ashley, a geneticist who heads Stanford 


i decades, doctors around the world 


University’s Center for Inherited Cardiovascu- 
lar Disease. “We're used to dealing with a few 
hundred patients, if we're lucky,” 

Numbers on that scale, he hopes, will tell 
him alot more about the relationship between 
physical activity and heart health. The rea- 
son they can be achieved is that millions of 
people now have smartphones and fitness 
trackers with sensors that can record all sorts 
of physical activity. Health researchers are 
studying such devices to figure out what sort 
of data they can collect, how reliable those 
data are, and what they might learn when they 
analyse measurements of all sorts of day-to- 
day activities from many tens of thousands 
of people and apply big-data algorithms to 
the readings. 

By July, more than 40,000 people in the 
United States had signed up to participate in 
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Ashley's study, which uses an iPhone applica- 
tion called MyHeart Counts. He expects the 
numbers to surge as the app becomes more 
widely available around the world. The study 
— designed by scientists, approved by institu- 
tional review boards, and requiring informed 
consent — asks participants to answer ques- 
tions about their health and risk factors, and 
to use their phone's motion sensors to collect 
data about their activities for seven days. They 
also do a six-minute walk test, and the phone 
measures the distance they cover. If their own 
doctors have ordered blood tests, users can 
enter information such as cholesterol or glu- 
cose measurements. Every three months, the 
app checks back to update their data. 

Physicians know that physical activity is a 
strong predictor of long-term heart health, 
Ashley says. But it is less clear what kind of 
activity is best, or whether different groups of 
people do better with different types of exer- 
cise. MyHeart Counts may open a window on 
such questions. “We can start to look at sub- 
groups and find differences,” he says. 

It is the volume of the data that makes such 
studies possible. In traditional studies, there 
may not be enough data to find statistically sig- 
nificant results for such subgroups. And rare 
events may not occur in the smaller samples, 
or may producea signal so weak that it is lost in 
statistical noise. Big data can overcome those 
problems, and if the data set is big enough, 
small errors can be smoothed out. “You can 
take pretty noisy data, but if you have enough 
of it, you can find a signal,” Ashley says. 


AN APPLE A DAY 

Gathering that much data is possible because 
of Apple software called ResearchKit, which 
can be used to develop iPhone-based apps for 
such studies. MyHeart Counts was one of five 
apps that were launched on the same day that 
ResearchKit was released. The others are try- 
ing to harness the power of big data to study 
Parkinson's disease, breast cancer, diabetes 
and asthma. 

The Parkinson's study, which enrolled about 
16,000 people by July, also uses a walking test, 
because Parkinson's manifests as a movement 
disorder. People walk 20 steps in a straight line, 
and the phone's accelerometer and gyroscope 
measure their gait to assess their motor con- 
trol. They are also asked to say “Aaah” for 10 
seconds into the phone; measuring how much 
the voice quavers can help to tell doctors about 
their muscle tone. “It is very well fitted for 
using the sensors native to the mobile device,’ 
says John Wilbanks, an open-data advocate 
at Sage Bionetworks, a non-profit biomedical 
research consultancy based in Seattle, Wash- 

ington, that developed 
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and can be linked to a fitness tracker to collect 
even more data. 

Similar apps are being written for other 
smartphone operating systems, such as Win- 
dows and Android, and their associated smart 
watches. There has also been a proliferation 
of wearable fitness devices from various com- 
panies including Basis, Fitbit and Jawbone. 
Additionally, researchers are developing other 
types of wearable sensor to collect data over 
time, including temporary tattoos and con- 
tact lenses that measure glucose levels in tears. 
Meanwhile, existing devices, such as continu- 
ous glucose monitors for people with diabetes, 
are rapidly evolving and adding their data to 
the mix on smartphones. 

Researchers are now trying to use smart- 
phones to go beyond measuring physical fit- 
ness. Some, for instance, track mental state 
and emotional health, by listening to the 
sound of a person's voice to identify stress, 
or by tracking their movement to determine 
their social interaction to figure out if they may 
be depressed. 

As portable devices are increasingly used 
to measure a whole range of human activity, 
and computers are now powerful enough to 
sift through this mountain of data, research- 
ers are hoping to obtain unprecedented insight 
into human health. 


MEASURING UP 

The wide variety of measurements from an 
ever-growing array of devices leaves research- 
ers having to figure out how to handle it all. “It’s 
just an exciting mess,” says Ida Sim, co-director 
of the biomedical informatics division at the 
University of California, San Francisco. 

Sim is a co-founder of Open mHealth, a 
non-profit company that is developing soft- 
ware to help clean up the mess by standard- 
izing, storing and 


processing data col- “Vou ean take 
lected from a variety pretty noisy 
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to put data together 
accurately,” she says. 
For a doctor to correctly interpret a glucose 
reading, for instance, it is important to know 
whether that person had been fasting for a 
period of time. 

Any effort to establish standards must 
address two critical questions. How accurate 
are the readings from these devices? And what 
exactly is being measured? Today’s fitness 
trackers are designed to tell users whether they 
have walked more this week than last week, 
say, not to collect laboratory-quality meas- 
urements. “What they know is general move- 
ments, which they try to convert to steps, some 
better than others,’ says Stephen Intille, who 
studies personal health informatics at North- 
eastern University in Boston, Massachusetts. 


Researchers at Northeastern University calibrate 
sensors during a range of activities in the lab. 


To get a better sense of what devices are actu- 
ally measuring, Intille brings volunteers into 
his lab and attaches various sensors to both 
arms and both legs — not only the commer- 
cial devices, but other, laboratory-calibrated 
sensors that record movement, heart rate, 
breathing and other data points. For 2-3 hours, 
he takes readings as the volunteers walk, do 
chores, ride a bicycle and carry out similar 
activities. Intille then removes some of the 
sensors and sends the person home, where 
the remaining devices collect real-world data 
for another couple of days. For the next three 
months, he cuts back to the one or two devices 
he is studying. 

This way, Intille can see precisely what the 
commercial devices are recording during a 
particular activity. For instance, a Fitbit moni- 
tor may produce a certain set of readings when 
a person is ironing clothes, while the lab equip- 
ment records heart rate and breathing. If the 
computer can be trained to recognize how dif- 
ferent activities produce different Fitbit read- 
ings in the lab, it may also be able to identify 
those activities in the real world and analyse 
their impact on physical fitness. 

“I don't personally believe these things are 
ever going to work really well without some 
interaction with the end user,’ says Intille. He 
wants a phone to tap into data from a fitness 
tracker and, having learned something about 
the individual's habits, ask questions, such as: 
“Are you walking the dog right now?” For a 
fuller picture, he says, people would need to 
wear more than one device, perhaps one on 
the wrist and another on the ankle. 

Such detailed information will be needed for 
researchers to obtain a broader understand- 
ing. It’s easy to have people report how far and 
how frequently they run, for example, or how 
intensely they work out at the gym, but little is 
known about the effect of day-to-day activity 
on people’s health, says physiologist William 
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Haskell, emeritus professor of medicine at the 
Stanford Center on Longevity. 

“We don't know a lot about the light-intensity 
range, from standing and just walking about,’ 
says Haskell, who has collaborated with Intille 
and worked to validate the measurements from 
commercial trackers. “How useful is a standing 
desk, where you get up and just stand for three 
hours a day, versus where you have a nice walk 
around the office? We just don't know.” 

Haskell started using accelerometers to track 
physical activity 40 years ago, and he is excited 
about the possibility of learning from wearable 
devices. “We think the technology is here,” he 
says. “We just need to validate it and use it to 
look at a 24-hour activity cycle” 


WEAR NEXT? 

Obtaining vast amounts of data can improve 
the power of fitness studies, but wearable 
technologies also open up the possibility of 
collecting different kinds of data that were not 
previously available: the long-term, round-the- 
clock monitoring of people just going about 
their business. 

‘A lot of the promise of big data is that you're 
not just looking at a lot of data, but you're 
looking at a lot of data from a lot of different 
sources,” says Sim. 

When she sees patients, she interacts with 
them for about 20 minutes. “For all the time 
they’re not in my clinic, I'm completely blind,” 
she says. “I have no idea what’s going on in 
their lives.” Constant data collection could 
ultimately change that equation and help doc- 
tors tailor their care to individual patients. 
Right now, though, Sim says that there is still a 
crucial missing link: no one has yet designed a 
way to send meaningful data from commercial 
devices to doctors. “It’s not built to fit into the 
physician's workflow at all,” she says. 

But such pervasive gathering of health 
information could also offer broader societal 
benefit. Data from thousands of individuals, 
collected unobtrusively with technology that 
is increasingly ubiquitous, could allow for 
population-wide studies of factors that can 
affect health. Ashley envisages a mobile-health 
version of the decades-long Framingham Heart 
Study, which has helped to identify risk factors 
for heart disease. He has already started to link 
the data he is collecting from iPhones with 
genomic data, which he collects from users who 
are patients at Stanford Medical Center. 

Intille believes that as bigger data sets are 
created, health researchers will be able to 
answer a whole range of new questions. “At the 
individual level we just haven't had any data 
like this at all,” he says. “It’s simply not possible 
to detect it until you have mobile devices. It’s 
totally different from the way we've dealt with 
health and medicine in the past.” m 


Neil Savage is a freelance science and 
technology writer based in Lowell, 
Massachusetts. 
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DEEP PHENOTYPING 


The details 


of disease 


Precision medicine demands precise matching of deep 
genomic and phenotypic models — and the deeper you go, 


the more you know. 


BY CATHRYN M. DELUDE 


r | \wenty years ago, amid an explosion of 
optimism that sequencing the human 
genome would lead to precision medi- 

cine, Isaac Kohane sounded a note of caution. 

Yes, gene sequencing was a major step forward. 

But wringing clinical value from the flood of 

genomic information, he said, would depend 

on the more pedestrian practice of pheno- 
typing — clinically characterizing traits that 
signify health or disease, such asa fever, arash, 

a limp or an irregular heartbeat. 

“Science is informed by what it is possible 
to measure, and it takes a great leap forward 
when we can measure something new,’ says 
Kohane, a bioinformatician at Harvard Medi- 
cal School in Boston, Massachusetts. “Previ- 
ously it was hard to measure differences in 
genome sequences among individuals. Now 
that’s been reduced to a commodity.” 

But measuring different phenotypes in 
diabetes, for instance, still requires someone 


to comb medical records for data on metrics 
such as weight, blood pressure and blood glu- 
cose levels — a tedious and expensive exercise. 
Moreover, new forms of measurement, such 
as continuous glucose monitoring, that may 
provide valuable clues to disease may not be 
included in these records. 

Precision medicine requires an understand- 
ing of the precise relationship between gene 
and phenotype, and the stratification of dis- 
eases into subtypes according to their under- 
lying biological mechanisms. But researchers 
do not know the functions of most genes, and 
what they do know is limited to a few cell types, 
tissues or physiological contexts. Furthermore, 
descriptions of disease phenotypes often fail 
to capture the diverse manifestations of com- 
mon diseases or to define subclasses of those 
diseases that predict the outcome or response 
to treatment. Phenotype descriptions are typi- 
cally “sloppy or imprecise’, according to a 2012 
review’. 

Overcoming these difficulties requires an 
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‘Outbred’ mice are used to reveal the 
genetic diversity that underlies disease. 


exhaustive examination of the discrete com- 
ponents of a phenotype that goes beyond 
what is typically recorded in medical charts. 
Such ‘deep phenotyping; as it is known, gath- 
ers details about disease manifestations in a 
more individual and finer-grained way, and 
uses sophisticated algorithms to integrate the 
resulting wealth of data with other kinds of 
information. 

Historically, phenotyping has not repre- 
sented big data. It has been partial, generic and 
time-consuming to gather. Information about 
individual phenotypes has not been matched 
to genetic variations among individuals. Deep 
phenotyping will provide more specificity, 
new types of big data, and potential connec- 
tions between disease subtypes and genetic 
variations. 

This approach will allow researchers to 
address new questions. What is the specific 
pattern of protein expression or gene regula- 
tion in the diseased cells? What about the cells’ 
metabolites and other biochemistry? Are there 
unusual gut bacteria? Does the patient have 
other seemingly unrelated conditions, such as 
autoimmunity or a psychiatric disorder, that 
might share a biological pathway? This com- 
prehensive deep-phenotyping information, 
in combination with other big data such as 
genomic data, can reveal the precise underly- 
ing mechanisms of each individual’s disease. 
As Kohane says, deep phenotyping “shows the 
different dimensions of the disease”. 


DIVIDING DIABETES 

Diabetes exemplifies the problem of impre- 
cise phenotypes. “There are a hundred ways 
to be diabetic, involving different processes 
in the pancreas, liver, muscle, brain and fat,” 
says Gary Churchill, a mouse geneticist at the 
Jackson Laboratory in Bar Harbor, Maine. 
“Genetic studies lose statistical power by look- 
ing at a conglomeration of underlying causes.” 
Different genes are responsible for particular 
subtypes of diabetes, so mixing them together 
obscures the reasons why people with the same 
genetic mutation respond differently to the 
same treatment. 

“There are many steps between causal gene 
and phenotype at the level of body weight and 
blood sugar,’ says Alan Attie, a biochemist at 
the University of Wisconsin-Madison who col- 
laborates with Churchill. “Each step is subject 
to genetic variation, which can weaken links 
between gene and phenotype.” 

Attie is looking at how individual genomic 
differences affect one particular phenotype of 
diabetes: insulin secre- 
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glucose, but also to fatty acids, amino acids and 
other molecules that affect insulin secretion. 
Preliminary data reveal significant variation 
among islet cells. 

Churchill says that studying ‘outbred’ mice, 
rather than inbred strains that have identi- 
cal genomes, better mirrors human diversity 
in diseases such as diabetes that have many 
genetic contributors. For instance, B6 mice, 
a commonly used inbred strain, would all 
get diabetes when they become obese for the 
same reason. “If we only studied that mouse, 
the findings would translate to some human 
patients but we wouldnt see the breadth of 
other causes,” he says. 


BRAIN WORK 

Combining deep phenotyping with big “omic 
data is far from straightforward. And the link 
between gene and phenotype is particularly 
precarious in neuropsychiatric disorders such 
as autism. 

“Precision medicine? That’s not about us. 
We barely know how to do medicine,” says 
Steven Hyman, a neuroscientist at the Broad 
Institute in Cambridge, Massachusetts. “In 
psychiatry, we only have descriptive pheno- 
types,’ he says, not mechanistic ones that 
reveal what has gone awry in the brain. Taking 
a deep-phenotyping approach to neuropsy- 
chiatric disease might break the current 
impasse in progress to better treatments, 
says Hyman. 

Most brain disorders are polygenic, with dif- 
ferent combinations of gene mutations causing 
disease in individual patients, so identifying 
genes still fails to explain the majority of cases. 
For autism, fewer than 10% of cases are linked 
to genes that might explain the underlying dis- 
ease mechanism. And an autism gene could 
also be involved in schizophrenia, obsessive- 
compulsive disorder and bipolar disorder, says 
Guoping Feng, a neuroscientist at the Massa- 
chusetts Institute of Technology in Cambridge. 
“Some symptoms are unique to each disorder, 
but other symptoms overlap” 

Furthermore, although most people with 
autism share core symptoms (such as repetitive 
behaviours and social deficits), some also have 
irritable bowel syndrome, infections, seizures, 
schizophrenia or attention deficit hyperactiv- 
ity disorder. “We should consider not just neu- 
rology and behaviour, but other diagnoses the 
patient has, such as inflammation and heart 
disorders,’ says Kohane. “Defining these sub- 
classes is a prerequisite for precision medicine.” 

Steve Brown, a mammalian geneticist at the 
Medical Research Council centre at Harwell, 
UK, hopes that his work with the International 
Mouse Phenotyping Consortium can untangle 
such complications. The consortium is system- 
atically phenotyping a knockout mouse strain 
for every gene in the mouse genome. 

“We can’t look at just one or two pheno- 
types because we don’t know the function 
of most genes,’ Brown says. “We can’t make 


assumptions about what to look for” Research- 
ers test each mouse for sensory perception, 
cardiovascular and lung functions, metabo- 
lism, morphology and pathologies, and record 
environmental conditions and diet. They also 
record behavioural data on activity, social 
interactions, grooming, sleeping and feeding. 

The consortium’s knockout mice are all from 
an inbred strain, which limits the exploration 
of natural diversity but enables comparative 
studies and replication of findings. “We never 
expect to create a model of autism or schizo- 
phrenia,” Brown says. Instead, the goal is to 
establish baselines for what each gene does 
and how it might affect behaviour. 


THE LIMITATIONS OF MODELLING 

Those who are performing deep phenotyp- 
ing in animal models acknowledge the fun- 
damental limitations of modelling disorders 
in non-human species, however. “Human 
neuropsychiatric disorders involve the pre- 
frontal cerebral cortex, which is a recent 
arrival in evolution,” Hyman says. “Many 
important cells and circuits in the human 
cerebral cortex simply aren't there in mice.” 
Scientists should focus on cells and molecu- 
lar mechanisms that are shared by mice and 
humans, he says. 

“Too many studies start with a transgenic 
mouse that is, say, lousy at building nests, 
decide it models schizophrenia or autism, and 
draw conclusions about the molecular mecha- 
nisms of disease,’ he adds. “It should work the 
other way round” 

Walker Jackson, a prion-disease researcher 
at the German Center for Neurodegenerative 
Diseases in Bonn, Germany, studies how sin- 
gle amino-acid muta- 
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familial insomnia in 
mice. Jackson meas- 
ures behaviours to 
understand the natural history of the diseases, 
but stops short of seeking a genetic link. “I'm 
not trying to see how a mutation connects to 
behaviour because it’s hard to know what is 
changing behaviour,’ he says. 

He finds that the same mutation affects 
some neurons but not others, and wants to 
understand how non-diseased neurons com- 
pensate for the mutation to reveal targets for 
therapy. These effects occur in the hippocam- 
pus, cerebellum and thalamus — all regions 
linked to the behavioural symptoms seen in 
these disorders. “The data are showing us that 
the disease is more complex than we thought,” 
Jackson says. “Affected neurons show dysfunc- 
tion in different ways, so therapy that works in 
one type of neuron may not work in others.” 

Similarly, researchers at Stanford Univer- 
sity Medical School in California started with 


simply aren’t 
therein mice.” 
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a single mutation in the NL3 gene that has 
been directly linked to some cases of human 
autism — a rare occurrence in psychiatric 
illness. They inserted this mutation in mice 
and traced its effect on motor behaviour to 
impaired dopamine inhibition in certain neu- 
rons in an unexpected brain region’. 

Feng used a similar approach to iden- 
tify neural circuitry imbalances caused by 
another autism gene (Shank3) in mice’. But 
this method cannot be widely used because 
most disorders involve myriad genes, each 
with a small effect. “I don't think deep pheno- 
typing a mutant mouse’s behaviour alone will 
give us great insight,” Feng says. But studying 
cells derived from humans might help, he 
suggests, because “these cells already have the 
perfect combination that can cause disease in 
a person.” 


THE HUMAN TOUCH 

Given the limitations of animal studies, and 
the advantages of studying illnesses directly in 
human cells, deep phenotyping is now extend- 
ing to research on new human cell models of 
complex diseases. Neuropsychiatric research- 
ers, for example, can induce skin cells to form 
stem cells, and can differentiate them into neu- 
rons or self-assembled clusters of cells called 
organoids, so they can study the connections 
between phenotypes, genomics and related 
biological data. 

Kohane is leading one such project, called 
N-GRID, which collects cells from patients 
with neuropsychiatric disorders to look for 
links between individual genomes and tran- 
scriptomes, proteomes, patterns of DNA 
methylation and other epigenetic markers 
that affect gene expression, responses to small 
molecules, and clinical features. The project’s 
deep-phenotyping approach includes “what- 
ever we can measure, to see if distinctive sub- 
sets emerge’, Kohane says. The aim is to build 
a “more robust scheme of classifying neuropsy- 
chiatric disease — one that is more reliable 
with regard to prognosis of these diseases, 
more insightful as to the biological aberration 
in each category and, therefore, more effective 
in treating the patient”. 

Hyman proposes that researchers should 
consider reserving animal models for safety 
and pharmacokinetics studies. The efficacy 
of a new therapy could be tested instead in 
engineered human cell cultures or organoids. 
“What if we can’t have a mouse model of 
schizophrenia?” he asks. This should not stop 
the quest for safe, effective therapies — and if 
animal models cannot provide good readouts 
on efficacy, deep phenotyping of human cells 
might well fill the gaps. = 


Cathryn M. Delude is a science writer based 
in Andover, Massachusetts. 
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PERSPECTIVE 


Sustaining the big-data ecosystem 


Organizing and accessing biomedical big data will require quite different business models, 
say Philip E. Bourne, Jon R. Lorsch and Eric D. Green. 


coveries, but the cost of sustaining these digital assets and the 

resources needed to make them useful have received relatively 
little attention. Research budgets are flat or declining in inflation- 
adjusted terms in many countries (including the United States), and 
data are being generated at unprecedented rates, so the research com- 
munity must find more efficient models for storing, organizing and 
accessing biomedical data. Simply putting more and more money into 
the current systems is unlikely to work in the long term. 

To better understand this situation, we are examining the cur- 
rent and projected costs of managing biomedical data at the US 
National Institutes of Health (NIH). Our initial analyses indicate 
that even if we leave out the National Center for 
Biotechnology Information, which is a special 
case, the 50 largest NIH-funded data resources 
havea collective annual budget of US$110 mil- 
lion. And this figure represents just the tip of 
the iceberg for future needs. 


B iomedical big data offer tremendous potential for making dis- 


UNDERSTANDING USAGE 

Today’s biomedical data resources typically treat 
all items in their collections equally. This does 
not always make sense, given that the usage 
patterns of the data vary. But how do we decide 
which data get more attention? As larger and 
larger data sets are generated more easily, and 
the cost of maintaining and annotating these 
data continues to rise, this question is becoming 
increasingly important. 

Answering it requires a better understanding of 
how research data are used. This has rarely been 
thoroughly explored. Historically, funders have 
been interested primarily in knowing how the data resources that they 
support are used and by whom. They tended not to look closely at the 
details of how and why individual items and types of data within a col- 
lection are used. 

Analyses of these details can be revealing. Preliminary studies 
suggest that typically a small subset of the data is used frequently, 
whereas most of the data are rarely accessed. However, the exact subset 
of data that is used heavily may change over time, and most of the data 
access may be performed after the data are downloaded, so this is not 
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recorded. All of this means that absolute numbers are hard to interpret. 

These caveats notwithstanding, more details of data usage are 
needed to inform funding decisions. Over time, such usage patterns 
could tell us how best to target annotation and curation efforts, estab- 
lish which data should receive the most attention and therefore incur 
the largest cost, and determine which data should be kept in the longer 
term. The cost of data regeneration can also influence decisions about 
keeping data. 

Funders should encourage the development of new metrics to ascer- 
tain the usage and value of data, and persuade data resources to pro- 
vide such statistics for all of the data they maintain. We can learn here 
from the private sector: understanding detailed data usage patterns 
through data analytics forms the basis of highly successful companies 
such as Amazon and Netflix. 


FAIR AND EFFICIENT 

When we have a better understanding of data usage, we can develop 
business models that consider supply and demand, and develop sus- 
tainable practices. In addition, finding economies of scale and harness- 
ing market forces will be essential. 

Fora typical biomedical data resource, the cost of simply keeping the 
data is only a small fraction of the total cost of data management. The 
remainder is largely the cost needed to support the finding, accessing, 
interoperating and reusing (the FAIR principles; see go.nature.com/ 
axkjiv) of the data — a cost that is widely under- 
appreciated. 

Is the FAIR fraction of the cost justified? Are 
services from different data resources redun- 
dant? Are resources subject to ‘feature creep’ — 
the addition of costly ‘bells and whistles’ that are 
of limited value? Do our funding mechanisms 
contribute to these problems? And most impor- 
tantly, is the way we currently maintain biomedi- 
cal data optimal for the science that needs to be 
done both today and in the future? 

Current practices typically use many disparate 
sources of data to conduct a study. These data 
are located in a variety of repositories, often with 
different modes of access. This lack of centraliza- 
tion and commonality may hinder their ease of 
use and reduce productivity. We need a better 
understanding of usage patterns across multiple 
data resources to use as a basis for redesigning 
such resources to preserve valuable expertise 
and curation, and for improving how the data are found, accessed, 
integrated and reused. 

The nature of curation and the quality assurance for biomedical 
data must also change. Complete and accurate automated or semi- 
automated extraction of literature is needed to provide metadata and 
annotation. We should consider crowdsourcing curation, with appro- 
priate validation and incentives. Additionally, the role of professional 
curators must be better appreciated by data users, by the institutions 
where the curators work, and by the funders. 
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In the longer term, we need models that are better aligned with 
the research life cycle. There is an unnecessary cost in a researcher 
interpreting data and putting that interpretation into a research paper, 
only to have a biocurator extract that information from the paper 
and associate it back with the data. We need tools and rewards that 
incentivize researchers to submit their data to data resources in ways 
that maximize both quality and ease of access. 


BUSINESS MODELS 

One business model worth exploring is the ‘freemium model’ Here, 
the primary data are available free of charge, but services that add value 
to these data have an associated charge that generates funds that are 
used to maintain the primary data. This approach is used in other 
disciplines, notably chemistry. But there are two knotty questions. 
Should for-profit institutions be charged the same as non-profits? 
And who should own the intellectual property associated with value- 
added content? 

Another potential business model is the ‘subscription model, which 
is used to access the genetic and molecular databases that are pro- 
vided by The Arabidopsis Information Resource (TAIR), for example. 
This option delivers support for a data resource from its active users, 
but it restricts access, which may be problematic for public-access 
data policies. 

Taking the business-model idea further, what happens if data 
resources are merged, acquired or go out of business? Would existing 
resources be more useful and cost-effective if they were merged in 
some way? Should some services be dropped owing to lack of demand 
to make way for new services? Would reducing funding for particular 
data resources over time promote increased efficiency? To answer such 
questions, we would benefit from advice and help from the private 
sector and from other scientific communities. 


COMMON GROUND 

Cloud computing creates an element of data virtualization, takes com- 
puting to the data, and may help to solve some of the problems facing 
biomedical big data. At the NIH, we propose to exploit these oppor- 
tunities by creating a ‘commons as one possible sustainable model. 

Physically speaking, the commons will be collections of public and 
private resources (including cloud resources) for storing data and 
computing with those data. To be commons-compliant, such resources 
must abide by two simple rules. First, each research object in the com- 
mons — for example, data, software, narratives or papers — must be 
uniquely identified, sharable (taking into account privacy issues), and 
resolvable to its source by using a common identifier. Second, each 
research object must be defined by a minimal amount of metadata, as 
defined by the community. 

The NIH Big Data to Knowledge (BD2K) programme (bd2k.nih. 
gov) aims to bring about the creation of the commons. The 12 new 
BD2K centres are encouraged to share research objects within the 
commons, anda BD2K consortium is prototyping an index that makes 
it easy to find commons content. 

We also are studying the notion of computing credits, in which a 
grant recipient is given credits instead of funding to pay for compu- 
tational time. A principal investigator would be able to spend those 
credits at any commons-compliant resource. Researchers whose work 
involves extensive computation on small amounts of data may spend 
their credits at a different commons-compliant resource to investiga- 
tors who do minimal computing on large amounts of data. 

This model is very different from the situation today. It shifts the 
initial burden of hardware, data and software maintenance from 
awardees and their institutions to third parties, notably cloud service 
providers. The funding model also has the effect of paying only for 
services used, and aims to create competition in the marketplace, so 
this approach could result in more data science per dollar. 

If the pilot studies at the NIH are successful, it will be important 


Research organizations such as the Broad Institute are rapidly evolving 
their practices for storing and accessing biomedical big data. 


to consider the longer-term implications of a commons model. One 
outcome is that data and software usage will be tracked both dur- 
ing an award period and after it has expired. Such tracking will yield 
important usage statistics that can inform future funding decisions. 


UNITING FUNDERS 

The medical research community has too little money to start new 
data resources or to support the growth of more mature databases 
and services. Moreover, current funding schemes do little to foster 
the development of best practices; for example, each data resource is 
usually reviewed in isolation. 

Changes to funding practices need to extend across both agency and 
international borders. Data generation and maintenance are typically 
funded nationally, but the data are used internationally. Asa result, we 
need to develop more equitable funding models. The first step is for 
funding agencies to communicate more effectively about data science 
problems and to seek collaborative solutions. Working from the bot- 
tom up, scientists have been doing this for a long time. 

Sustaining the biomedical big-data ecosystem is the responsibility 
of all stakeholders, and will require coordinated efforts among data 
generators, data maintainers, data users, funders, publishers and oth- 
ers in the private sector. The NIH BD2K programme, in collaboration 
with many stakeholders, is beginning to address these issues. m 


Philip E. Bourne is associate director for data science at the US 
National Institutes of Health. He was previously associate vice- 
chancellor for innovation and industry alliances at the Office 

of Research Affairs at the University of California, San Diego. 

Jon R. Lorsch is director of the National Institute of General Medical 
Sciences. He was previously professor of biophysics and biophysical 
chemistry at Johns Hopkins University in Baltimore, Maryland. Eric D. 
Green is director of the National Human Genome Research Institute. 
He was previously its scientific director, chief of its genome technology 
branch and director of the NIH Intramural Sequencing Center. 
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Q&A Perry Nisen 


Better insights, 
better drugs 


A former paediatric oncologist and molecular biologist with experience in academia and industry, 
Perry Nisen was senior vice-president for science and innovation at GlaxoSmithKline in 2014 
before becoming chief executive at the Sanford Burnham Prebys Medical Discovery Institute in 
La Jolla, California. He discusses the challenges facing drug discovery in the era of big data. 


What are some of the biggest obstacles in 
bringing big data into drug discovery? 
Certainly we'll benefit from integrating large 
data sets, but it is imperative that this is not 
uncoupled from biological investigation. 
One of the challenges in pharma, at a time 
of increased externalization and partnering 
for research, is how to retain deep biological 
insight and connect that to the interrogation 
of these large data sets. One without the other 
is misguided. 

As we move from the lab into the clinic, it 
is useful to study large, longitudinal clinical 
data sets. But again, interrogating those data 
without clinical insight is not very meaningful. 
The big prizes will go to those who connect 
the large clinical data sets with an abundance 
of preclinical data, and bring all that together. 
I don’t think anybody has got that right yet. 

In the academic world, the two new drugs 
to treat high cholesterol that target the pro- 
tein PCSK9 are a great example of making the 
connection. Helen Hobbs of the University of 
Texas Southwestern Medical Center and her 
partners connected formidable genetics, bio- 
logical understanding and chemical insight to 
lead to the important new medicines. 

In the pharma world, Genentech seems to 
have made a long-term investment in biol- 
ogy and linked that to clinical data to treat the 


right person with the right drug — witness 
Herceptin for patients with breast cancer. 


How are drug companies figuring out where to 
place their bets? 

Given the attention deficit disorder and exter- 
nalization of research in pharma, and the 
ever-increasing demands of venture capital 
and other financial 
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immunotherapy tar- 
get] right now. 

But we are also seeing examples of pharma 
companies making very big, bold decisions — 
going after certain bespoke immune therapies, 
for example, without any hugely compelling 
evidence that this approach will work. 

I think there is an argument for companies 
externalizing and partnering on research, so 
that they are not missing something at the 
bleeding edge of discovery. At the same time, 
those decisions of when and where you jump in, 
with enough confidence, are truly challenging. 
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What is a good example of a tough decision? 
Look at microbiome research. Everyone is 
really excited about the microbiome. It is tan- 
talizing that all these billions of bacteria in our 
gut, skin and everywhere else influence disease 
and how we respond to drugs. We have seen 
associations between particular bacterial flora 
and disease states, but when will we be com- 
fortable enough to invest substantially in mak- 
ing medicines to alter someone's microbiome? 
I don't know the answer to that. 

To date, most of the data set out to define 
the diverse repertoire of microbes are from 
small numbers of individuals. Conducting 
evaluations from larger cohorts of subjects is 
daunting, and long-term clinical data on these 
cohorts have been limited. 

Finally, we lack the robust tests that would 
let us modify the microbiome in a durable way 
and have a meaningful impact on the disease 
process. 


Why does Sanford Burnham Prebys combine 
basic research with drug discovery? 

Pharma often has a disconnect between the 
applied research in making medicines and the 
deep biological insight and ongoing experi- 
mentation that informs it. We have 80 people 
here from pharma who have made making 
medicines their whole career. Our principal 
investigators can work hand-in-hand with 
drug discoverers, and that enables us to pursue 
research and go after targets that nobody else 
will work on because they are too uncertain. 

Building up the big-data element creates a 
unique situation in this work. For example, 
when we start thinking about autoimmunity, 
about what happens when, with which T cells 
and B cells, how do we start disentangling 
those complexities? 

This is where big-data bioinformatics can be 
hugely useful. That to me underscores more 
than ever the need both to analyse large data 
sets and to stay connected to researchers who 
understand all the moving pieces with a view 
from a little higher up. 


Is it difficult to find research staff with skills in 
gathering and analysing big data? 

Yes, we struggle with this. Part of the chal- 
lenge is training and funding individuals who 
are sophisticated enough to pursue that. They 
also tend to be gobbled up by potentially more 
lucrative fields outside the life sciences. 

We have been searching hard to attract and 
recruit the next wave of systems biologists and 
other people who can analyse large amounts of 
data. You want to have a critical mass of people 
doing that together, and it’s really tough. There 
are people who generate data and people who 
analyse data, but there are few who do both 
really well. I believe the winners are the ones 
who can pull it all together. m 


INTERVIEW BY ERIC BENDER 


This interview has been edited for length and clarity. 
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Gathering and 
understanding the 
deluge of biomedical 
research and health data 
poses huge challenges. 
But this work is 


4 BIG QUESTIONS “= 


How can long-term 


Data storage may be getting 


of new therapies, requiring 
ongoing education for clinicians 
who need detailed information to 
make clinical decisions. 


Researchers, funders and others 


instance, the CancerLinQ project 
provides recommendations 

for patients with cancer whose 
treatment is hard to optimize. 


rapidly changing 
the face of medicine. 


BY ERIC BENDER 


“Our mission is to use data 


power of the web, spawning 
an ‘app store’ for health.” 
Kenneth Mandl, Harvard 
Medical School. 


access to biomedical cheaper, particularly in cloud need to analyse data usage and science to foster an open digital 
data that are vital computing, but the total costs look at alternative models, such ecosystem that will accelerate 
for research be ‘of maintaining biomedical as ‘data commons’, for providing efficient, cost-effective biomedical 
improved? data are too high and climbing access to curated data in the research to enhance health, 
rapidly. Current models for long term. Funders also need lengthen life and reduce illness 
handling these tasks are only to incorporate resources for and disability.” Philip Bourne, 
stopgaps. doing this. US National Institutes of Health. 
How can the barriers ‘De-identified’ data from clinical Patient advocates are lobbying for : “There’s a lot of genetic 
to using clinical trial trials and patients’ medical access to their own health data, information that no one 
results and patients’ = records offer opportunities including genomic information. understands yet, so is it okay or 
health records for for research, but the legal and The European Medicines Agency safe or right to put that in the 
research be lowered? ~ technical obstaclesare immense. ~ is publishing clinical reports hands of a patient? The flip side 
Clinical study data are rarely submitted as part of drug is: It’s my information — if 
shared, and medical records are applications. And initiatives such | want it, | should get it.” Megan 
walled off by privacy and security as CancerLinQ are gathering O’Boyle, Phelan-McDermid 
regulations and by legal concerns. | de-identified patient data. Syndrome Foundation. 
How can knowledge Delivering precision medicine will Health systems are trying to “Developing a standard interface 
: from big data be immensely broaden the scope bring up-to-date treatments to for innovators to access the 
brought into point- of electronic health records. This clinics and build ‘health-care information in electronic health 
of-care health-care massive shift in health care willbe | learning systems’ thatintegrate | records will connect the point 
delivery? complicated by the introduction with electronic health records. For ; of care to big data and the full 


The lack of attractive career 
paths in bioinformatics has led 
to a shortage of scientists that 
have both strong statistical skills 
and biological understanding. 
The loss of data scientists to 
other fields is slowing the pace 
of medical advances. 


Research institutions will take 
steps, including setting up 
formal career tracks, to reward 


_ bioinformaticians who take on 


Canacademiacreate : 
better career tracks : 
for bioinformaticians? | 


multidisciplinary collaborations. 
Funders will find ways to better 
evaluate contributions from 
bioinformaticians. 


“Perhaps the most promising 
product of big data, that labs 
will be able to explore countless 


: and unimagined hypotheses, 


will be stymied if we lack the 
bioinformaticians that can make 
this happen.” Jeffrey Chang, 
University of Texas. 


Eric Bender is a freelance science writer based in Newton, Massachusetts. 


5 NOVEMBER 2015 | VOL 527 | NATURE | S19 
© 2015 Macmillan Publishers Limited. All rights reserved 


